9

I've seen plenty that discusses whether a basic Poisson regression is a nested version of a zero-inflated Poisson regression. For instance this site argues that it is, since the latter includes extra parameters to model additional zeroes, but otherwise includes the same Poisson regression parameters as the former, though the page does include a reference that disagrees.

What I can't find information about is whether a zero-truncated Poisson and a basic Poisson are nested. If the zero-truncated Poisson is just a Poisson with the extra stipulation that the probability of a zero count is zero, then I guess it sounds like they could be, but I was hoping for a more definitive answer.

The reason I'm wondering is that it will affect whether I should use Vuong's test (for non-nested models), or a more basic chi-square test based on the difference in loglikelihoods (for nested models).

Wilson (2015) talks about whether a Vuong test is appropriate for comparing the zero-inflated regression with the basic one, but I can't find a source that discusses zero-truncated data.

amoeba
  • 93,463
  • 28
  • 275
  • 317
Justin
  • 295
  • 1
  • 10

2 Answers2

4

The basic Poisson can be thought of as nested inside a more general form:

$p(x) = (1-p)\frac{\text{e}^{-\lambda}\lambda^x}{x!} + p1(x=0)$

When $p = 0$, we have the basic Poisson. When $p = -\exp\{-\lambda\}/(1 -\exp\{-\lambda\})$, we have the zero-truncated Poisson. When $-\exp\{-\lambda\}/(1 -\exp\{-\lambda\}) < p < 0$, we have a zero-reduced Poisson. When $0 < p < 1$, we have a zero-inflated Poisson, and we have a degenerate distribution at $p = 1$.

So it seems to me that the nested version of the Vuong test, or the chi-square as you suggest, would be appropriate in your case. Note, though, that the chi-square can have problems due to the small probabilities of "large" (relative to $\lambda$) observations. You'd probably want to use a bootstrap to get the p-value for the chi-square statistic instead of relying on the asymptotics unless you've got rather a lot of data.

jbowman
  • 31,550
  • 8
  • 54
  • 107
  • Thanks @jbowman - that's the sort of more rigorous answer I was hoping for. I'm unclear though: I thought the whole point of a Vuong test was for non-nested models, so even though it goes beyond my original post, could you provide a little more information about the "nested version of the Vuong test". To be clear about the source of my confusion: up until this moment I was only aware of the `vuong` function in package `pscl` in R which says it's for non-nested models. I just googled and found function `vuongtest` in package `nonnest2` which includes an argument 'nested'. Is that it? – Justin Aug 28 '16 at 18:50
  • Yes, that is. Actually, the Wikipedia page https://en.wikipedia.org/wiki/Vuong%27s_closeness_test on the Vuong test is mildly helpful (often it's not so much) in describing the difference. – jbowman Aug 28 '16 at 19:06
  • Great. Follow-up question: what would the difference in df between a plain Poisson and a zero-truncated Poisson be, for the chisq test? Apologies if that should be obvious from your description, but I'm afraid I'm not as mathy as I should be. – Justin Aug 28 '16 at 19:19
  • Just divide all the probabilities by $(1-\exp\{-\lambda\})$ (and set $p(0) = 0$, of course). That makes all the probabilities sum to 1. I have to say, though, that I suspect a chi-square wouldn't be as powerful as just testing the probability of seeing 0 "0" values given that the estimated probability of seeing a 0 is $\exp\{-\lambda\}$. Not sure about that, though. The idea is that the relative probabilities of all integers > 0 are the same, and if you observe even one "0" you don't need a test to reject zero-truncation, so focusing on p(0 | non-zero-truncated) might give a better test. – jbowman Aug 28 '16 at 20:59
  • 1
    NB *Both* the Poisson & the zero-truncated Poisson are special cases of the distribution you've defined. One isn't nested in the other. So you can't use Wilks' theorem to derive an asymptotic chi-squared distribution for twice the log likelihood ratio, whichever you consider to be the null hypothesis. (I think there are some regularity conditions for the Vuong test too.) – Scortchi - Reinstate Monica Aug 30 '16 at 10:53
  • @Scortchi - arrgghhhhh.... what was I thinking? Well I'll probably delete the answer, or I'll have to rewrite it so that it is correct. – jbowman Aug 31 '16 at 16:53
  • @jbowman: Please don't delete it: (1) it illustrates that the two models mentioned aren't nested, & (2) considering tests about or confidence intervals for $p$ in this broader model is a useful approach in itself (have you ever really had to decide between a Poisson & a zero-truncated Poisson based on the observations?). – Scortchi - Reinstate Monica Sep 01 '16 at 08:18
  • 3
    @Scortchi I am curious about the definition of "nested" you are applying. Although I don't disagree with your conclusion, I come to it from a slightly different point of view: yes, the Poisson is nested within this family (because it arises by restricting to $p=0$) but various conclusions about asymptotic distributions of MLE parameter estimates for $p$ do not apply because this value of $p$ *lies on the boundary* of the family. Am I missing some important distinction? – whuber Apr 15 '17 at 20:55
  • 2
    @whuber, I was going to comment/provide an answer about the same point. The [referenced link](http://statisticalhorizons.com/zero-inflated-models) does note: "... although the chi-square distribution may need some adjustment because the restriction is on the boundary of the parameter space" – Ben Bolker Apr 15 '17 at 20:58
  • @whuber. The two specific models the OP's interested in aren't nested one within the other, & each has only one parameter to estimate - $\lambda$. That's all I meant to point out, & I hadn't got as far as you in thinking about inference about $p$ in the broader model given in this answer. – Scortchi - Reinstate Monica Apr 17 '17 at 13:31
4

Just come across this now. To avoid confusion, I am the Wilson of Wilson(2015) referenced in the original question, which asks whether the Poisson and truncated Poisson models are nested, non nested etc. Slightly simplifying, a smaller model is nested in a larger model if the larger model reduces to the smaller one if a subset of its parameters are fixed at stated values; two models are overlapping if they both reduce to the same model when subsets of their respective parameters are fixed to certain values, they are non-nested if no matter how parameters are fixed one cannot reduce to the other. According to this definition the truncated Poisson and standard Poisson are non-nested. HOWEVER, and this is a point that seems to have been overlooked by many, Vuong's distributional theory refers to STRICTLY nested, STRICTLY non-nested, and STRICTLY overlapping. "STRICTLY" referring to the addition of six restrictions to the basic definition of nested etc. These restrictions are not exactly simple, but they do, among other things, mean that Vuong's results about the distribution of log likelihood ratios are not applicable in cases where models/distributions are nested at a boundary of a parameter space (as is the case with Poisson/zero inflated Poisson with an identity link for the zero-inflation parameter) or when one model tends to the other when a parameter tends to infinity, as is the case with the Poisson/zero-inflated Poisson when a logit link is used to model the zero-inflation parameter. Vuong advances no theory about the distribution of log likelihood ratios in these circumstances. Unfortunately here, this is the case with Poisson and truncated Poisson distributions, one tends to the other as the parameter tends to infinty, to see this, note that the ratio of the pmfs of Poisson and truncated Poisson distributions is 1-exp(-lambda) which tends to 1 as lambda tends to infinity, thus the two distributions are not stricty non-nested, or strictly anything for that matter, and Voung's theory is not applicable.

The following R code will simulate the distribution of poisson and truncated Poisson loglikelihood ratios. It requires the VGAM package.

n<-30   
lambda1<-1
H<-rep(999,10000)
for(i in 1:10000){
  print(i)
  y<-rpospois(n, lambda1)
  fit1 <- vglm(y ~ 1, pospoisson)
  fit2<-glm(y~1, family=poisson(link="log"))
  H[i]<-logLik(fit1)-logLik(fit2)
}

hist(H,col="lemonchiffon")
Pauljw11
  • 146
  • 2
  • 2