4

I did a negative binomial regression on a data set with 4 covariables. The count outcome has values up to 600. I did a mixture model with 2 components, also called a latent class model.

However, I am not sure if the results make sense. These are the parameter estimates for the first component:

intercept 2.1
v1        3.6
v2       -0.5
v3        1.1
v4        2.2

and for the second component:

intercept  100.1
v1          -0.7
v2           2.5
v3          -2.0
v4           0.5

I know that exponentiating the coefficients gives incidence rate ratios. These would be very high here. I am usually used to having coefficients in the range of -1 to +1. But I never had a data set with count values that high. Is this the reason for the high parameter estimates or is something wrong here?

Michael R. Chernick
  • 39,640
  • 28
  • 74
  • 143
spore234
  • 1,323
  • 1
  • 15
  • 31
  • 2
    Yeah that intercept is worrying. How many observations have a high probability of being in that component? (Is it just a handful of outliers?) – Andy W Mar 31 '17 at 12:49
  • @AndyW yes, the two clusters are very inbalanced, component 2 has a weight of below 0.1. I am more worrying about v1-v4, in this stata example the do not report the IRR for the intercept: http://stats.idre.ucla.edu/stata/output/negative-binomial-regression/ – spore234 Mar 31 '17 at 12:55
  • 1
    I would do some plotting here. Box plots of the DV against the IVs, for sure. – Peter Flom Mar 31 '17 at 12:56
  • Well, I mean it is possible that model is ok for that component, it depends on typical values for v1 and v3 in that component -- but on its face I would not bet money on predictions out of sample for that model. What I would do is look at individual observations that have a high probability of being in that component, and look at scatterplot matrices for all the variables. Explosive predictions for Poisson models is not uncommon, see [here](https://andrewpwheeler.wordpress.com/2014/06/17/poisson-regression-and-crazy-predictions/) for one example. – Andy W Mar 31 '17 at 13:25
  • Given your max value in sample is only 600, I would say predictions of `exp(10)` or more are so far off-base you should be concerned. – Andy W Mar 31 '17 at 13:27

0 Answers0