Data count regression with a truncated distribution

Question

Imagine that we are conducting an experiment to test the effectiveness of a treatment, where the «level of illness» is measured by a count that is distributed as a negative binomial (NB). The plan is to use a mixed GLM for NB distributed counts.

First, it doesn't appear to make sense to actually treat the people who are not sick at all, so you would want to remove the $0$s from the initial distribution and see how the treatment affects sick people. I first thought that this was no big deal because you can shift the distribution by $-1$ and what you get still reasonably fits a NB distribution. The problem is that you would need to consistently shift the distribution after the treatment, which means that cured people would fall at $-1$, which doesn't make sense from a NB point of view.

So I can only see two options:

Actually include non-sick people in the initial sample (which is not as bad as it sounds from a practical point of view as we are dealing with minor mental issues and non-invasive treatments so this would be technically possible).
Keep the truncated initial sample, with no shift, and be able to argue that it's still ok to use a GLM on count data where the distribution has $0$s truncated off at $t=0$.

My question is: does option $2$ look reasonable or is it really that bad?

Thanks for any ideas or comments.

score 0 · Answer 1 · answered May 01 '21 at 11:55

I think it's perfectly reasonable to truncate the original sample; the population you're interested in are people that are impacted by a range of conditions. Neurotypical people aren't in your population and you can remove those data.

I'm afraid I'm not intimately familiar with the NB distribution, but my question is the same regardless of my experience in the matter: are you certain this is an appropriate distribution? If so (previous models, theory, et cetera), then yes it is reasonable. Your population fit a distribution, I don't look at it as truncated, but rather it's a clearly specified subset of the general population.

Thanks for your answer. Yes, the negative binomial fits perfectly in the general population, but the point is that when you remove neurotypical people then it doesn't any more, because 0 is supposed to be the more frequent value and they were all taken off. — Arnaud Mortier, May 01 '21 at 12:38

Data count regression with a truncated distribution

1 Answers1