If there are 0's in the contingency table and we're fitting nested poisson/loglinear models (using R's glm
function) for a likelihood ratio test, do we need to adjust the data prior to fitting the glm models (e.g. add 1/2 to all the counts)? Obviously some parameters cannot be estimated without some adjustment, but how does the adjustment/lack of adjustment effect the LR test?
-
presumably the `glm` routine would bonk if it could not handle zeros. have you tried it? – shabbychef Jun 06 '11 at 03:16
-
1yes it doesn't crash, but depending on the formula (e.g. in a saturated model), some of the parameters can have effectively infinite standard errors. My question is whether this is a problem when doing a likelihood ratio test. You can still calculate a likelihood even if some parameters aren't estimated, those parameters just won't contribute to the likelihood. What's the standard practice and why? – BR1 Jun 06 '11 at 13:16
1 Answers
One of the powers of regression modeling generally is you can smooth over areas of no data - though as you have noticed, there are occasionally problems in estimating parameters. I would suggest that if you're getting things like infinite standard errors its time to reconsider your modeling approach at bit.
One particular note of caution: There is a difference between "Having no counts" in a particular strata, and it being impossible for there to be counts in that strata. For example, imagine you're working on a study of psychological disorders for the U.S. Navy between say 2000 and 2009, and have binary regression terms for both "Is a Woman" and "Serves on a Submarine". A regression model may be able to estimate effects where both variables = 1 despite having a zero count where both = 1. However that inference wouldn't be valid - such a circumstance is impossible. This problem is called "non-positivity" and is occasionally a problem in highly stratified models.

- 21,264
- 10
- 78
- 137
-
@skyguy94 Oddly enough I don't - I knew that, I had just forgotten to note the use of a retrospective data set >.<. edited="" reflect="" that.="" to=""> – Fomite Apr 22 '12 at 22:53
-
Re: "A regression model may be able to estimate effects where both variables = 1, **or interactions between the two**" - I don't think that's true. If you have two binary predictors that are never '1' together, then the interaction is constant (it is always '0'), so its effect is not identified. – Macro May 02 '12 at 01:55
-
@Macro You're right, I'm editing slightly. I was thinking for terms where they're not binary indicators. – Fomite May 02 '12 at 01:58
-
1(+1) So, issues with non-plausibility of the case where both=1 aside, the model based estimate would just be the sum of the two marginal effects, which we know can be very misleading in it's own right :) – Macro May 02 '12 at 02:01