Why is the logistic regression hypothesis seen as a probability function?

Question

I understand that we use it to predict 0 or 1, but still, why a function (the hypothesis) that outputs numbers between 0 and 1 may be considered a probability function?

Is this a heuristic?

this post may be helpful. Check the accepted answer. http://stats.stackexchange.com/questions/229645/why-there-are-two-different-logistic-loss-formulation-notations — Haitao Du, Sep 06 '16 at 14:55

Glen_b · Accepted Answer · 2020-07-26T22:53:23.523

No, it's not merely a heuristic. It's quite deliberately intended to be a model for the conditional distribution of the response.

Logistic regression is a particular case of a generalized linear model (GLM), in this case for a process where the response variable is conditionally Bernoulli (or more generally, binomial).

A GLM includes a specification of a model for the conditional mean of the response. In the case of a Bernoulli variable, its conditional mean is the parameter $p_i$, which is explicitly the probability that the response, $Y_i$ is $1$. It is modeled in terms of one or more predictors. Here's the model for the mean for a single predictor, $x_i$:

$$P(Y_i=1|x_i)=\frac{\exp(\beta_0+\beta_1x_i)}{1+\exp(\beta_0+\beta_1x_i)}$$

So it is (intentionally) a model for the probability that the response is $1$, given the value of the predictors.

The form of the link function $\eta=\log(p/(1-p))$ (and its inverse $p=\exp(\eta)/(1+\exp(\eta))$) is no accident either -- the logit link (which is what makes it logistic regression) is the natural (or canonical) link function for a binomial response. Other choices of link function are possible (and they, too will be models for the probability of a 1). Other common choices for a binomial response are the probit and the complementary log-log but the logistic is by far the most common.

Correct me if I am wrong but the function should be a proper scoring rule. In such cases it will reflect the probability. — Cagdas Ozgenc, Sep 06 '16 at 13:27
@Cagdas The criterion being optimized in estimation GLMs is likelihood (though MLE can be seen as a special case of optimal score estimation). Are you casting the modelling of Bernoulli variables as a forecasting problem here? — Glen_b, Sep 06 '16 at 14:02

Why is the logistic regression hypothesis seen as a probability function?

1 Answers1

Linked