I am analyzing data from 3 field experiments (farms=3) for a citrus flower disease: response variable is binomial because the flower can only be diseased or healthy.
I have particular interest in comparing 5 fungicide spraying systems (trt=5). I am not interested in the effect of a specific farm, they simply represent the total of farms from the region where I want to suggest the best treatments.
Each farm had 4 blocks (bk=4) including 2 trees as subsamples (tree=2) in which I assessed 100 flowers each one.
This is a quick look of the data:
dinc <- within(dinc, { tree_id <- as.factor(interaction(farm, trt, bk, tree)) })
farm trt bk tree dis tot tree_id
<fctr> <fctr> <fctr> <fctr><int> <int> <fctr>
iaras Calendar 1 1 0 100 iaras.Calendar.1.1
iaras Calendar 1 2 1 100 iaras.Calendar.1.2
iaras Calendar 2 1 1 100 iaras.Calendar.2.1
iaras Calendar 2 2 3 100 iaras.Calendar.2.2
The model I considered was:
resp <- with(df, cbind(dis, tot-dis))
m1 = glmer(resp ~ trt + (1|farm/bk) , family = binomial, data=df)
I tested the overdispersion with the overdisp_fun() from GLMM page
chisq ratio p logp
4.191645e+02 3.742540e+00 4.804126e-37 -8.362617e+01
As ratio (residual dev/residual df) > 1, and the p-value < 0.05, I considered to add the observation level random effect (link) to deal with the overdispersion.
so now was added a random effect for each row (tree_id
) to the model, but I am not sure of how to include it. This is my approach:
m2 = glmer(resp ~ trt + (1|farm/bk) + (1|tree_id), family = binomial, data=df)
I also wonder if farm
should be a fixed effect, since it has only 3 levels...
m3 = glmer(resp ~ trt * farm + (1|farm:bk) + (1|tree_id), family = binomial, data=df)
I really appreciate your suggestions about my model specifications...