1

This is a subtle question which I don't think has been precisely asked so please read carefully before voting to close:

It's well known that GLMs, notably logistic regression, can spit out bizarre output with little to no warning or help to the analyst when the data are sparse or when there is separation or quasiseparation.

GLMs estimated with the Newton Raphson method trigger an early termination due to fitted probabilities that are numerically one or zero, meaning the floating point arithmetic doesn't have the precision to identify whether a boundary estimate maximizes the likelihood.

Are there algorithms or other approaches to finding and reporting possibly infinite coefficients in bivariate or multivariate GLMs models that terminate early due to numerical instability?

Addendum:

While this question has excellent answers, it requires that we actually know that separation has occurred. It is not necessarily the case that we know this. For instance, in multiple dimensions it can be very hard to detect.

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
AdamO
  • 52,330
  • 5
  • 104
  • 209
  • Are you asking (as you seem to have written) for "algorithms ... that terminate early due to numerical instability" or are you looking for algorithms that do **not** terminate early (or are numerically stable, or produce accurate results in all cases, or ...)? – whuber Sep 10 '19 at 20:08
  • @whuber More the latter. Rather, suppose my logistic regression explodes. Is there a method or routine I can use to determine whether a unique, boundary solution exists, or whether no unique solution exists, or perhaps a range of solutions. – AdamO Sep 10 '19 at 20:31
  • 1
    I think the problem isn't with the solution "exploding" or even being on the boundary of a set, but with the fact that in such perfect-separation cases there is an entire interval of solutions. That's what makes the possibility of convergence to a unique solution problematic. From this point of view, the general issue is "what is/can be done when the likelihood of a statistical model is maximized at a set of points where each point is a limit of the others?" – whuber Sep 10 '19 at 21:47

0 Answers0