I'm trying to predict a binary outcome using 50 continuous explanatory variables (the range of most of the variables is $-\infty$ to $\infty$). My data set has almost 24,000 rows. When I run glm
in R, I get:
Warning messages:
1: glm.fit: algorithm did not converge
2: glm.fit: fitted probabilities numerically 0 or 1 occurred
I've read the other responses that suggest perfect separation might be occurring, but I'm confident that isn't the case in my data (though quasi-complete separation could exist; how can I test to see if that's the case?). If I remove some variables, the "did not converge" error might go away. But that's not always what happens.
I tried using the same variables in a bayesglm
function and got the same errors.
What steps would you take to figure out exactly what's going on here? How do you figure out which variables are causing the problems?