I'm studying the logistic regression for estimate the Probability of Default of SME's. Fortunately the event (firm's default) is a rare event.
King and Zeng tell us that "logistic regression can sharply underestimate the probability of rare events" (Logistic regression in rare events data, 2001). This is because the logistic regression coefficient is biased in these situations.
Could someone tell me in which paper is proved that the logistic regression intercept is biased when the event (Y=1) is rare.