3

I'm doing a logistic regression, which I understand I can do by simply saying

$$ \operatorname{logit}(Y)=\beta_0+\beta_1 x+\varepsilon $$

where $\varepsilon$ is normally distributed around $0$. Then then we can use the usual OLS methodology to fit the $\beta$s, and when we set $\varepsilon =0$, this gives us our best estimate $\widehat{\operatorname{logit}(Y)}$.

My question is, how can we find $\hat Y$ from here. I think that it isn't as simple as $\hat Y=\operatorname{logit}^{-1}\left(\widehat{\operatorname{logit}(Y)}\right)$, because I know by analogy, $\hat Y=\exp\left(\widehat{\log(Y)}+\frac{1}{2}\sigma^2\right)$.

I looked up a logit-normal distribution (https://en.wikipedia.org/wiki/Logit-normal_distribution), but it says that there's no analytical solution for the mean of such a distribution. But I think I must be missing something because what good is the logistic regression if not to estimate $Y$.

T.J. Gaffney
  • 177
  • 1
  • 1
  • 9
  • It might help to review the basic concepts; in [logistic regression](https://en.wikipedia.org/wiki/Logistic_regression) the logit transform is of the mean, rather than of the data (which means it works on data consisting only of 0 and 1, for example). See also the sections on the generalized linear model relating to [intuition](https://en.wikipedia.org/wiki/Generalized_linear_model#Intuition) and the following overview section. There are many useful posts on site relating to logistic regression – Glen_b May 29 '17 at 02:51
  • Possible duplicates: [How to specify a logistic regression as a transformed linear regression](https://stats.stackexchange.com/questions/162251/how-to-specify-logistic-regression-as-transformed-linear-regression) and [logit link in glm and inverse logit](https://stats.stackexchange.com/questions/262019/logit-link-in-glm-and-inverse-logit) – Glen_b May 29 '17 at 03:05

1 Answers1

4

Your understanding of logistic regression has some errors.

The logistic regression equation is

$$ \operatorname{logit}(E(Y))=\beta_0+\beta_1 x $$

Notice, there is no random part of the model on the right hand side. The linear part estimates the logit of the expected value of $Y$ exactly.

The randomness comes from how $Y$ disperses around it's expectation. To write the model explicitly in your style, you would have to write something like

$$ Y \mid x = \operatorname{Bernoulli}\left(p = \operatorname{logit}^{-1}(\beta_0+\beta_1 x) \right) $$

As a consequence, you cannot use OLS technology to fit a logistic regression. Logistic regressions are fit using iterative optimization, usually based off Newton's method.

Matthew Drury
  • 33,314
  • 2
  • 101
  • 132