I am currently trying to model the count of agent in a system, in which I systematically varied the available space. I figured that I could use a classical Poisson distribution. However, the model is not fitting nicely and there is some trend in it that I need to solve.
The model looks like this:
mod <- glmmTMB(N_Agents ~ space + density1 + density2 + density1:length + density2:length + (1|ID), data, family = poisson(link = "log))
N_Agents
being my number of agents, space
being a 4 levels factor (my condition, which I am considering as factor despite being technically numerical), density1
, density2
and length
are continuous variables. However, density1
and density2
are strictly integers and contain only 6 values each (so they are heavily clustered). ID
is the name of each replicate and there are 10 replicates for each space
condition.
Using DHARMa to diagnose my residuals, testDispersion
outputs an underdispersion of data with ratioObsSim = 0.7
. testUniformity
is also significant with D = 0.04
and a p-value of 0.0007. No outliers are detected (outliers frequency = 0), which may be an additional signal for underdispersion, although the p-value is not significant (0.184 in a two sided test). No zero inflation is detected as well.
Despite the plethora of information on overdispersed data, I could not find much on underdispersion. I understand that I may use a Conway-Maxwell distribution or a Generalized Poisson to fit these data, but they seem to not solve my dispersion when naively implemented into my model (i.e. family = compois(link = "Log")
). I also suspect the presence of autocorrelation at lag 1 within my data (that is, the number of agents at time (t) should be influenced by the number of agents at time (t-1)), but I think I should solve the model fitting parameters before digging into it.
Is a dispersion parameter of 0.72 good enough, or should I aim at having at least .90? If the latter is true, are the compois
or genpois
families appropriate for my model? And if so, should I add any kind of dispformula
into my model to properly account for underdispersion?
Sorry for the many questions, I am no expert in statistics so I am not sure whether this is the right way of thinking about my problem. Any help would be greatly appreciated!
ps. I can provide some graphs if needed.