Case Control Sampling in Logistic Regression

Question

In this lecture by Hastie and Tibshirani, it is mentioned that with case-control samples, we can estimate the logistic regression parameters accurately, but the intercept term is incorrect. It also gives a formula to correct the intercept term.

Could you please mathematically explain why this happens?

Additionally, could you please explain why exactly case-control sampling is necessary, and which methods of classification are particularly sensitive to imbalanced priors?

We have no access to that lecture, so it is hard to comment. However, I would not say that the constant is incorrect, I would say it is of little substantive interest as it represents the design rather than an empirical estimate of the prevalence of the event. — Maarten Buis, Dec 22 '16 at 09:29
Thanks @MaartenBuis for your reply. I have edited the link to the lecture, should be accessible now. Anyway, can you please explain why changing the case-control ratio would only affect the constant and not the other regression parameters? — Pradnyesh Joshi, Dec 22 '16 at 13:58
Do you understand what a case control study is? After Trevor Hastie discussed the logistic regression model, Rob Tibshirani interrupted and gave a great explanation of what a case control study is and why it is often practical when a prospective study is not. Trevor qualified what he said mentioning "if the model is correct". Unfortunately I do not follow why the intercept should be 0 but a transformation corrects it. — Michael R. Chernick, Dec 22 '16 at 16:05
Logistic regression is so widely used because of its link function, logit. For a binary outcome, this has attractive properties, among which its unbiasedness under different sampling scheme including case-control. Please find the reasoning here: https://stats.stackexchange.com/questions/296320/during-oversampling-of-rare-events-why-are-the-beta-coefficients-of-the-indepen/297903#297903 — user8463728, Aug 14 '17 at 21:18
[Scortchi's answer here](https://stats.stackexchange.com/a/68726/1352) points to a reference. — Stephan Kolassa, Aug 04 '18 at 21:34

Case Control Sampling in Logistic Regression

0 Answers0