56

On whether an error term exists in logistic regression (and its assumed distribution), I have read in various places that:

  1. no error term exists
  2. the error term has a binomial distribution (in accordance with the distribution of the response variable)
  3. the error term has a logistic distribution

Can someone please clarify?

amoeba
  • 93,463
  • 28
  • 275
  • 317
user61124
  • 563
  • 1
  • 5
  • 4
  • 7
    With logistic regression - or indeed GLMs more generally - it's typically not useful to think in terms of the observation $y_i|\mathbf{x}$ as "mean + error". Better to think in terms of the conditional distribution. I wouldn't go so far as to say 'no error term exists' as 'it's just not helpful to think in those terms'. So I wouldn't so much say it's a choice between 1. or 2. as I would say it's generally better to say "none of the above". However, irrespective of the degree to which one might argue for "1." or "2.", though, "3." is definitely wrong. Where did you see that? – Glen_b Nov 20 '14 at 13:52
  • 1
    @Glen_b: Might one argue for (2)? I've known people to say it but never to defend it when it's questioned. – Scortchi - Reinstate Monica Nov 20 '14 at 14:49
  • 6
    @Glen_b All three statements have constructive interpretations in which they are true. (3) is addressed at http://en.wikipedia.org/wiki/Logistic_distribution#Applications and http://en.wikipedia.org/wiki/Discrete_choice#Binary_Choice. – whuber Nov 20 '14 at 20:11
  • @whuber: I've corrected my answer wrt (3), which wasn't well thought through; but still puzzled about in what sense (2) might be right. – Scortchi - Reinstate Monica Nov 20 '14 at 21:27
  • 3
    @Scortchi Although you are right that (2) is incorrect, if we interpret it as saying that the difference between an observation and its expectation has a Binomial distribution *translated by the expectation*, then it will be (trivially) correct. The parenthetical remark in (2) strongly suggests this is the intended interpretation. Note that other useful "error terms" can be defined, too, such as the $\chi^2$ and deviance error terms described in Hosmer & Lemeshow (and, subject to suitable caveats discussed there, their squares have approximate $\chi^2$ distributions). – whuber Nov 20 '14 at 21:42

4 Answers4

40

In linear regression observations are assumed to follow a Gaussian distribution with a mean parameter conditional on the predictor values. If you subtract the mean from the observations you get the error: a Gaussian distribution with mean zero, & independent of predictor values—that is errors at any set of predictor values follow the same distribution.

In logistic regression observations $y\in\{0,1\}$ are assumed to follow a Bernoulli distribution with a mean parameter (a probability) conditional on the predictor values. So for any given predictor values determining a mean $\pi$ there are only two possible errors: $1-\pi$ occurring with probability $\pi$, & $0-\pi$ occurring with probability $1-\pi$. For other predictor values the errors will be $1-\pi'$ occurring with probability $\pi'$, & $0-\pi'$ occurring with probability $1-\pi'$. So there's no common error distribution independent of predictor values, which is why people say "no error term exists" (1).

"The error term has a binomial distribution" (2) is just sloppiness—"Gaussian models have Gaussian errors, ergo binomial models have binomial errors". (Or, as @whuber points out, it could be taken to mean "the difference between an observation and its expectation has a binomial distribution translated by the expectation".)

"The error term has a logistic distribution" (3) arises from the derivation of logistic regression from the model where you observe whether or not a latent variable with errors following a logistic distribution exceeds some threshold. So it's not the same error defined above. (It would seem an odd thing to say IMO outside that context, or without explicit reference to the latent variable.)

† If you have $k$ observations with the same predictor values, giving the same probability $\pi$ for each, then their sum $\sum y$ follows a binomial distribution with probability $\pi$ and no. trials $k$. Considering $\sum y -k\pi$ as the error leads to the same conclusions.

Scortchi - Reinstate Monica
  • 27,560
  • 8
  • 81
  • 248
  • 1
    Could you provide an simple example regarding the part 'no error term exists'. I'm having troubles understanding it the way it's written. – Quirik Aug 19 '17 at 09:53
  • @Scortchi I'm having trouble following the case when in practice the model is used with some threshold, say 0.5. Then the error is either 1 or 0. Can this then be considered a Bernoulli random variable with parameter 1-$\pi$ when the true label is 1? – wabbit Jun 04 '19 at 16:22
  • Can you shed more light on what do you mean by mean parameter conditional on the predictor values ? which mean are you subracting from observation to get error error=actual-residual where is mean coming into picture here ? – star Feb 14 '21 at 16:47
  • Paragraph 2 seems flawed on 2 counts. 1) The statement that there are only 2 possible values for the error refers probably to a Linear Probability Model and not to logistic regression. 2) The reason why people say a logistic regression has no error term is not the one stated. The reason is that the randomness in a logistic model (along lines of Frank Harrell's answer) comes from a random draw of the y from a Bernoulli distribution with a probability that is a function of the predictors); there literally is no error term; contrast with a linear model - randomness comes from additive error term. – ColorStatistics Jul 18 '21 at 11:33
  • @ColorStatistics: (1) If, for some predictor values $\vec{x}$, $\pi=f(\vec{X})=0.2$, then the possible errors are $0.8$ & $-.2$, regardless of what kind of function $f$ is. So it doesn't matter whether we're talking about LPM or logistic regression, or probit regression, &c. (2) What if I were to insist the contrary:-"The randomness in a logistic model comes from the additive error terms [defined in my answer]; whereas in a linear model the randomness comes from a random draw of the $y$ from a Gaussian distribution with a mean that is a function of the predictors"? How could ... – Scortchi - Reinstate Monica Jul 19 '21 at 01:16
  • ... we possibly settle the matter? – Scortchi - Reinstate Monica Jul 19 '21 at 01:17
  • Are you saying we can always look at any regression in these 2 ways: #1) additive error, #2) random draw of y? I guess I just haven't seen linear regression explained in terms of #2 but don't see why it wouldn't work. Help me understand this better. I can post this as a new, related question if you can help me identify the heart of I am missing here. Thank you. – ColorStatistics Jul 19 '21 at 15:42
24

This has been covered before. A model that is constrained to have predicted values in $[0,1]$ cannot possibly have an additive error term that would make the predictions go outside $[0,1]$. Think of the simplest example of a binary logistic model -- a model containing only an intercept. This is equivalent to the Bernoulli one-sample problem, often called (in this simple case) the binomial problem because (1) all the information is contained in the sample size and number of events or (2) the Bernoulli distribution is a special case of the binomial distribution with $n=1$. The raw data in this situation are a series of binary values, and each has a Bernoulli distribution with unknown parameter $\theta$ representing the probability of the event. There is no error term in the Bernoulli distribution, there's just an unknown probability. The logistic model is a probability model.

Frank Harrell
  • 74,029
  • 5
  • 148
  • 322
20

To me the unification of logistic, linear, poisson regression etc... has always been in terms of specification of the mean and variance in the Generalized Linear Model framework. We start by specifying a probability distribution for our data, normal for continuous data, Bernoulli for dichotomous, Poisson for counts, etc...Then we specify a link function that describes how the mean is related to the linear predictor:

$g(\mu_i) = \alpha + x_i^T\beta$

For linear regression, $g(\mu_i) = \mu_i$.

For logistic regression, $g(\mu_i) = \log(\frac{\mu_i}{1-\mu_i})$.

For Poisson regression, $g(\mu_i) = \log(\mu_i)$.

The only thing one might be able to consider in terms of writing an error term would be to state:

$y_i = g^{-1}(\alpha+x_i^T\beta) + e_i$ where $E(e_i) = 0$ and $Var(e_i) = \sigma^2(\mu_i)$. For example, for logistic regression, $\sigma^2(\mu_i) = \mu_i(1-\mu_i) = g^{-1}(\alpha+x_i^T\beta)(1-g^{-1}(\alpha+x_i^T\beta))$. But, you cannot explicitly state that $e_i$ has a Bernoulli distribution as mentioned above.

Note, however, that basic Generalized Linear Models only assume a structure for the mean and variance of the distribution. It can be shown that the estimating equations and the Hessian matrix only depend on the mean and variance you assume in your model. So you don't necessarily need to be concerned with the distribution of $e_i$ for this model because the higher order moments don't play a role in the estimation of the model parameters.

hard2fathom
  • 520
  • 1
  • 3
  • 7
0
  1. No errors exist. We are modeling the mean! The mean is just a true number.
  2. This doesn't make sense to me.
  3. Think the response variable as a latent variable. If you assume the error term is normally distributed, then the model becomes a probit model. If you assume the distribution of the error term is logistic, then the model is logistic regression.
gung - Reinstate Monica
  • 132,789
  • 81
  • 357
  • 650
  • 3
    I fail to see how this helps one understand a probability model. Probability models are simpler than this makes it seem. – Frank Harrell Apr 02 '15 at 11:48