3

Reading about logistic regression model, I wondered about the link existing between the logit (or $log\frac{p}{(1-p)}$) and the probability of an event defined as binary by assumption and modeled by using the logistic regression model.

I know that transformation has a numerical result that is constrained to the interval $[0, 1]$, and, consequentially, can be directly thought as a probability cause of the latter reason, but, I wonder if it does exist a mathematical explanation to this.

Can someone answer to the question by providing a mathematical proof or reference about?

Quantopik
  • 223
  • 1
  • 6
  • 21

1 Answers1

8

In section 4.2 of Pattern Recognition and Machine Learning (Springer 2006), Bishop shows that the logit arises naturally as the form of the posterior probability distribution in a Bayesian treatment of two-class classification. He then goes on to show that the same holds for discretely distributed features, as well as a subset of the family of exponential distributions. For multi-class classification the logit generalizes to the normalized exponential or softmax function. Following this, the value of the logit or softmax can therefore actually be interpreted as a probability in a variety of settings, but not as the frequentist probability of an event, but as the Bayesian probability of an underlying cause (class) given the data.

A. Donda
  • 2,819
  • 14
  • 32