I have a dataset in which the event rate is very low ( 40,000 out of $12\cdot10^5$). I am applying logistic regression on this. I have had a discussion with someone where it came out that logistic regression would not give good confusion matrix on such low event rate data. But because of the business problem and the way it has been defined, I can't increase the number of events from 40,000 to any larger number though I agree that I can delete some nonevent population.
Please tell me your views on this, specifically:
- Does accuracy of logistic regression depend on event rate or is there any minimum event rate which is recommended ?
- Is there any special technique for low event rate data ?
- Would deleting my nonevent population would be good for the accuracy of my model ?
I am new to statistical modeling so forgive my ignorance and please address any associated issues that I could think about.
Thanks,