1

I've been using stepAIC to narrow down my logistic regression model. However, I get the following warning when I run my model:

glm.fit: fitted probabilities numerically 0 or 1 occurred

I know this means I have complete or quasi-complete separation in my data. On examination of my data, I see the quasi-complete separation and think that it's meaningful. Reading online, I see recommendations to use a Firth penalized regression (logistf) or exact logistic regression (elrm); but neither of these will work with stepAIC. I've also tried bayesglm but I still get the same warning.

How should I select a model when my data has complete separation? How would I do this in R? Is my mistake in my stats or in my understanding of using the packages in R? Any help would be much appreciated!

Sycorax
  • 76,417
  • 20
  • 189
  • 313
csharrell
  • 11
  • 1

1 Answers1

1

See High p-values for logistic regression variable that perfectly separates? & How to deal with perfect separation in logistic regression? for some background. If stepwise selection were appropriate (take @gung's advice to read Algorithms for automatic model selection), a straightforward approach would be just to make the decisions based on likelihood-ratio tests (equivalently AIC), as the Wald tests will be badly wrong.

Perhaps you could take a similar approach with Firth regression: Heinze & Schemper (2002), "A solution to the problem of separation in logistic regression", Statist. Med., 21, pp 2409–19, form a penalized log likelihood ratio test statistic for the null hypothesis that a given coefficient is zero analogously to the unpenalized version, & imply that its asymptotic distribution is also chi-square with one degree of freedom. (By the way, "problem" is rather a loaded term.)

Scortchi - Reinstate Monica
  • 27,560
  • 8
  • 81
  • 248