4

I'm studying the logistic regression for estimate the Probability of Default of SME's. Fortunately the event (firm's default) is a rare event.

King and Zeng tell us that "logistic regression can sharply underestimate the probability of rare events" (Logistic regression in rare events data, 2001). This is because the logistic regression coefficient is biased in these situations.

Could someone tell me in which paper is proved that the logistic regression intercept is biased when the event (Y=1) is rare.

Luca Dibo
  • 467
  • 1
  • 4
  • 19
  • 1
    Link for those interested: http://gking.harvard.edu/files/abs/0s-abs.shtml – Tim Dec 05 '14 at 18:24
  • Don't the McCullagh & Nelder GLM book cited in @Tim's link and the appendices in King et al's paper contain the proofs? – dimitriy Dec 05 '14 at 19:31
  • Thanks @Dimitriy, I saw the proof on the appendices but I hoped to find another one simpler. I'll check the one on the McCullagh&Nelder book cited in King and Zeng paper. – Luca Dibo Dec 06 '14 at 13:03
  • I checked, but the main problem for me is conceptual: since the logistic regression belong to the GLM's family, we know that the estimation of the parameters is done through the Fisher scoring algorithm and that there isn't a closed formula for the estimator. So if this is true, how could I now that the intercept is biased? – Luca Dibo Dec 07 '14 at 20:55
  • 1
    @LucaDibo Take a look at pp. 703-704 of [King and Zheng paper](http://gking.harvard.edu/files/baby0s.pdf#Page=11) for the intuition on the single regressor case. – dimitriy Dec 07 '14 at 22:42
  • Logit is one of the main tools to model defaults, btw. – Aksakal Jan 16 '15 at 21:40
  • yes @Aksakal I know..but you can't use random sapling…you have to use a case-control sampling. If you use random sapling the coefficients are biased also in big sample. So the problem here is that if you use the case-control sample, then the intercept is biased, but the other parameters are consistent. I can't understand the proof – Luca Dibo Jan 16 '15 at 21:45
  • In my case, we don't sample. We collect all the mortgages that are available. – Aksakal Jan 16 '15 at 22:04

0 Answers0