8

Why is the logistic regression hypothesis seen as a probability function?

I understand that we use it to predict 0 or 1, but still, why a function (the hypothesis) that outputs numbers between 0 and 1 may be considered a probability function?

Is this a heuristic?

Sycorax
  • 76,417
  • 20
  • 189
  • 313
user_anon
  • 827
  • 10
  • 20
  • this post may be helpful. Check the accepted answer. http://stats.stackexchange.com/questions/229645/why-there-are-two-different-logistic-loss-formulation-notations – Haitao Du Sep 06 '16 at 14:55

1 Answers1

10

No, it's not merely a heuristic. It's quite deliberately intended to be a model for the conditional distribution of the response.

Logistic regression is a particular case of a generalized linear model (GLM), in this case for a process where the response variable is conditionally Bernoulli (or more generally, binomial).

A GLM includes a specification of a model for the conditional mean of the response. In the case of a Bernoulli variable, its conditional mean is the parameter $p_i$, which is explicitly the probability that the response, $Y_i$ is $1$. It is modeled in terms of one or more predictors. Here's the model for the mean for a single predictor, $x_i$:

$$P(Y_i=1|x_i)=\frac{\exp(\beta_0+\beta_1x_i)}{1+\exp(\beta_0+\beta_1x_i)}$$

So it is (intentionally) a model for the probability that the response is $1$, given the value of the predictors.

The form of the link function $\eta=\log(p/(1-p))$ (and its inverse $p=\exp(\eta)/(1+\exp(\eta))$) is no accident either -- the logit link (which is what makes it logistic regression) is the natural (or canonical) link function for a binomial response. Other choices of link function are possible (and they, too will be models for the probability of a 1). Other common choices for a binomial response are the probit and the complementary log-log but the logistic is by far the most common.

Glen_b
  • 257,508
  • 32
  • 553
  • 939
  • Correct me if I am wrong but the function should be a proper scoring rule. In such cases it will reflect the probability. – Cagdas Ozgenc Sep 06 '16 at 13:27
  • @Cagdas The criterion being optimized in estimation GLMs is likelihood (though MLE can be seen as a special case of optimal score estimation). Are you casting the modelling of Bernoulli variables as a forecasting problem here? – Glen_b Sep 06 '16 at 14:02