7

I have a non-count data with huge number of zeros in the target variable. I need to fit a model being a mixture of Dirac delta function and normal distribution parametrized by mean $X\beta$ and variance $\sigma^2$, with mixing proportion $\pi$, i.e.

$$ y \sim \left\{ \begin{array}{cl} 0 & \text{ with probability }\pi \\ \mathcal{N}\left(X \beta, \sigma^2 \right) & \text{ with probability } 1-\pi\end{array} \right.$$

to account for the excess zeros. Could you provide me with any references about such models? Or maybe there is some approach that is better, then the above, for continuous, zero-inflated data?

AdamO
  • 52,330
  • 5
  • 104
  • 209
Tim
  • 108,699
  • 20
  • 212
  • 390
  • 2
    If you have a genuine expectation that the true distribution is indeed a zero-inflated normal, then just fit that model and be done with it. Whether other approaches are better depends on whether the expectation/evidence that some other choice of distribution is a better approximation to nature. Edited to add: it seems an odd sampling process that both deals with continuous data and has a huge number of integers (zeros) in it. – Jacob Socolar Jun 01 '17 at 14:35
  • 3
    @user43849 the process that produces such data is very easy to imagine: think of some kind of device that is idle for most of the time, but sometimes fires some continuous signals. – Tim Jun 01 '17 at 14:43
  • Interestingly, the Wiki excerpt of the [tag:zero-inflation] tag says *there is zero-inflated normal regression*. Not that this would help, but I find it curious. – Richard Hardy Jun 01 '17 at 14:46
  • @RichardHardy I wasn't able to find *any* references dealing with such models, this is how the question emerged... – Tim Jun 01 '17 at 14:51
  • When it's non-zero is the response positive? – Glen_b Jun 02 '17 at 04:46
  • @Glen_b unfortunately not, otherwise this would be a Tobit model. – Tim Jun 02 '17 at 07:21
  • Are you sure that there are no covariates that are associated w/ $\pi$? – gung - Reinstate Monica May 25 '21 at 15:30
  • @gung here asking just about the simplest case. – Tim May 25 '21 at 15:36
  • 1
    In Epidemiology, this is a common problem and the term for such variables is "spike at zero". However, all methods that I am aware of are assuming non-negative values. But maybe the term helps to find extensions to negative values. – LuckyPal May 25 '21 at 16:10

1 Answers1

1

I have found 2 references so far using zero-inflated normal regression, one in medical research and the other in animal conservation:

Both the response variables, Agatston scores of CAC and the number of fledglings of brood, are probably non-negative, however.

eco-model
  • 21
  • 4