0

When I fit a logistic regression model with my data set, I get the warning "glm.fit: fitted probabilities numerically 0 or 1 occurred", and the result for one of my dummy coefficients $x$ is nonsensical (extremely high estimate and variance). I'm aware that this warning usually indicates a perfect prediction problem, where the independent variable $x$ is perfectly separated by the dependent variable $y$. In my case, the reason seems to be that $y$ has the same value for all cases where $x=1$. However, in cases where $x=0$, the values of $y$ vary and it sometimes takes on the same value as when $x=0$. So $y$ doesn't perfectly separate $x$, but still this is what causes the problem. When I change just one of the $y$ values for $x=1$, the model estimation works fine.

Why does this error occur in my case, if the perfect prediction problem isn't the cause? I'm not looking for a solution, I just want to understand the problem.

elias1772
  • 3
  • 1
  • 3
    This seems to be a duplicate of [How to deal with quasi-complete separation in a logistic GLMM?](https://stats.stackexchange.com/questions/38493/how-to-deal-with-quasi-complete-separation-in-a-logistic-glmm) – Jarle Tufto May 26 '21 at 10:39

1 Answers1

2

I suspect you will get this warning if any of the predictions are 1 or 0, rather than all of them. This is because the contribution to the gradient of the loss function with respect to the parameters will be zero for that pattern as it is proportional to $y_i \times (1 - y_i)$ which will be zero for such points.

However, @Jarle_Tufto also is correct to point out that the logistic regression problem does not have a unique coefficient for this problem (see below - please forgive my ropey drawing)enter image description here. Say 2/3 of the examples for x=0 have y = 0 and 1/3 y = 1 and all of the points at x=1 have y=1, then any solution that will give those values at those points will have the same value for the loss function. Adding a regularisation term to get it to choose the smallest slope resolves that indeterminancy (and probably generalise better)

Dikran Marsupial
  • 46,962
  • 5
  • 121
  • 178