3

I have fit my own logistic model and I am using it to calculate the probabilities of different outcomes. I would like to know how I can obtain the confidence intervals of these probabilities.

I am using scikit in python, but I am happy to know just the mathematical answer.

Escachator
  • 240
  • 2
  • 13
  • 1
    A fair amount has been written about this topic here: http://stats.stackexchange.com/questions/5304/why-is-there-a-difference-between-manually-calculating-a-logistic-regression-95?rq=1 – Sycorax Nov 09 '15 at 18:52
  • 1
    Couldn't find the answer there – Escachator Nov 09 '15 at 19:03

1 Answers1

2

The typical logistic regression model is written as something like $$\log\left(\frac{\pi}{1-\pi}\right)=x^T\beta$$ where we model the log-odds by a linear combination of our predictor variables $x$. In the equation above $\pi$ would be the probability that you are interested in calculating a confidence interval for.

Now, rearranging terms, we know that we can estimate the probability $\pi$ as $$\hat\pi=\frac{e^{x^T\hat\beta}}{1+x^T\hat\beta}$$

where $\hat\beta$ are the estimated coefficents from your linear regression.

It should be noted that, since maximum likelihood estimates are invariant to transformation, $\hat\pi$ may also be considered the maximum likelihood estimate of $\pi$.

So now, construction of confidence interval proceeds using the fact that $$\frac{x^T\hat\beta-x^T\beta}{\hat{SE}}\stackrel{.}{\sim} z$$ where $$\hat{SE}=\sqrt{x^T(X^TWX)^{-1}x}$$

We can then construct a $(1-\alpha)$ confidence interval for $x^T\beta$ as $$(L,U)=\left(x^T\hat\beta-z_{\alpha/2}\hat{SE},\,x^T\hat\beta+z_{\alpha/2}\hat{SE}\right)$$ and thus, finally, a $(1-\alpha)$ confidence interval for $\pi$ is therefore $$\left(\frac{e^L}{1+e^L},\,\frac{e^U}{1+e^U}\right)$$


Some of the details above I have excluded assuming that you understand how a logistic regression works in general, and its corresponding design matrix, etc.

  • 1
    thanks for your answer! My only doubt in your answer is in the first formula where z appears. Is that z the z-score? Can you explain it a bit further? And what is the W of the next formula? Thanks again – Escachator Nov 10 '15 at 08:24
  • @Escachator that is the z-score. The distribution is approximately normal. –  Nov 10 '15 at 15:12
  • I understand that the sigmoid function is the cdf of the normal distribution, however, in this case the sigmoid function is the distribution itself, i.e. is the pdf. That is my first doubt. The other is not sure how to understand that concrete formula ((xTβ^−xTβ) / SE^)... Thanks! – Escachator Nov 10 '15 at 16:17