0

I am trying to troubleshoot model adequacy problems for underdispersed count data (number of correct responses in a simple task; dispersion ratio is 0.3) that I modeled with Conway-Maxwell-Poisson. The residuals of my model look normally distributed, but there is what looks like a nonlinear pattern in the residuals-to-fitted-values plot, at least to my rookie eye. These are the diagnostic plots, as generated by performance::check_model():

enter image description here

Here is the randomized quantile residual diagnostic plots from DHARMa::simulateResiduals(), as the normality of traditional residuals seems problematic for discrete response variables:

enter image description here

The posterior predictive check doesn't look great either: enter image description here

My question to the experts out there is whether it looks as bad as I think, and if so, how would you go about troubleshooting? I understand that nonlinearities in the residuals-to-fitted-value plots indicate that some higher order terms is missing or an interaction. When I was exploring the data, I had plotted the log of my response counts to each of my covariates separately to see whether the relationship was linear or required some transformation (tried exponential, quadratic and inverse of my covariates) and I included as many interactions in my max model as it would theoretically make sense and converge. To be honest, it's not always so clear that the raw data was better fitted by a nonlinearization of the covariate. Here is an example with one of the covariates:

enter image description here

I fitted my Conway-Maxwell-Poisson distribution model using glmmTMB::glmmTMB in R with the option family = compois(link = "log"). I had started off with a "max" model with interaction terms and random effect terms, used buildmer::buildglmmTMB() to step backwards and used the likelihood ratio test to arrive at a model that has some interactions with no random effect terms (I know this is arguably not an optimal way of approaching regression modelling....would love ot hear alternatives).

Thanks a ton in advance for any help/advice/hints/words of wisdom!

Gina

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
gj.
  • 31
  • 3
  • 1
    It does not make sense to check residuals of a count data model for normality. How many data points do you have? Judging from your description, you have done a fair bit of data dredging, which any test for goodness of fit would reflect (i.e., it will be biased towards attesting a better GoF than you "actually" have). If you have enough data, you could try cross-validation of either the MSE if you are only interested in the expectation, or a full proper scoring rule. – Stephan Kolassa Jan 18 '21 at 14:15
  • Thanks, Stephan. Had come across suggestions to look at RQR for discrete variables - had used `DHARMa::simulateResiduals()` (orig post now edited) but not sure how I read it -- and that nonlinear pattern is there. Isn't the residuals-to-fitted-values an appropriate check and diff to checking residuals ~ N? I don't have a great deal of data points -- 30 sets of data points per group and 4 groups. I would like to explain/predict accuracy performance in terms of the different covariates and factors in my model and just would like to be assured that I can rely on the results of my model. – gj. Jan 18 '21 at 16:13

0 Answers0