How to build a GLMM that observes the years since experimental design was established?

Question

Hello, my first question, quite individual, so I find it difficult to relate already answered questions to mine.

I have observed the vegetation development in forests of 5 different areas (area) in an experimental experiment. The design can be classified according to 3 types of size and distribution (treatment) including control plot, and the occurrence of deadwood with 5 types including control plot. For each treatment there is every option of deadwood and vice versa. The data have been collected for 4 years (years -> 1y-4y). Five microclimate variables (mc1-5) are included as fixed effects. As random effects I would like to include area, treatment and deadwood. With regard to vegetation development it is interesting how the effects of microclimate (mc1-5), but especially of treatment and deadwood have changed over the years. In area the factor years should be negligible. As I understand it, years and treatment /deadwood are nested, because the same plots are examined every year.

My previous attempt to build a model:

glmm <- glmer(species.number ~ mc1 + mc2 + mc3 + mc4 + mc5 + 
              (1|treatment/years) + (1|deadwood/years) + (1|area), 
              data=df, family = poisson)

Among other things, I am confused by this actually very good answer that in my case it might be crossed data after all?

Thanks a lot!

(+1) Can you provide more details please - how many areas and treatments are there ? Is deadwood binary? It sounds like your research question concerns the effect of the mc variables on the number of species, but also how these evolve over time - is that correct ? If not please explain your research question. I am sorry that my answer you linked to has confused you !! — Robert Long, Nov 12 '20 at 16:49
First of all, the confusion is due to my dilettantism in r and GLMM in particular, definitely not due to your answer. I have edited some information in the question above, I hope it helps. Your assumption regarding the research question is correct! — Ole Herbchandler, Nov 12 '20 at 17:01
Thanks for the info. Can you explain with a bit more detail what the variables `treatment` and `deadwood` actually are ? — Robert Long, Nov 12 '20 at 17:27
They are both kind of grouping variables. `treatment` describes the size and distribution of disturbances in forest in one letter: A = aggregated, C = control, D = distributed. `deadwood` describes the occurrence of deadwood in the same way, lying deadwood, standing deadwood, standing + lying, control, removed... — Ole Herbchandler, Nov 12 '20 at 17:59

score 2 · Accepted Answer · answered Nov 12 '20 at 18:47

The central interest is in the "effects" of the microclimate variables, and whether these change over time. As such, time should be treated as a fixed effect. Treating it as random will only allow for counts to be more similar in one year than another. Interacting year with the microclimate variables will help to answer the research questions.

area seems to meet the usual criteria for treating a factor as random - samples from a greater population of areas, with no actual interest in the "effects" of the different areas, with the exception of the number of areas. 5 is on the cusp of what is generally thought of as the minimum number of levels for treating a factor as random in the frequentist paradigm. It could be argued either way whether it should be random or not.

From the description in the comments, treatment and deadwood appear to be factors that are crossed, but each nested within area. They don't really seem like samples from a wider population, however this is not crucial as there are often conflicting criteria (and other criteria I haven't mentioned). Since they are nested within area, we don't need to worry about them having only 3 and 5 levels.

So the model I would suggest is:

species.number ~ mc1*year + mc2*year + mc3*year + mc4*year + mc5*year + (1|area) + (1|area:treatment) + (1|area:deadwood)

If you expect a linear association of the counts with time, then it would be a good idea to treat year as numeric (0,1,2,3), particularly if the microclimate variables are categorical, otherwise you will have a lot of output to interpret.

Lastly I would also suggest some consideration about whether there is an causual dependence among any of the fixed or random effects - are all the microclimate variables independent from each other and the the treatment and deadwood variables.

Thanks so much! For further explanation: Each `area` has the same 9 plots. These 9 plots are each divided into a `treatment` A = aggregated and D = distributed, which have 4 different `deadwood` categories each, plus a C = control plot that remains untouched (in case of `treatment` AND of `deadwood`). The microclimate variables are partly related to each other (for example light intensity and temperature) as well as to the respective `treatment` and `deadwood`. I think this complicates things even more ;) — Ole Herbchandler, Nov 12 '20 at 19:10
The central question is how vegetation changes over time, with special regards to `treatment` / `deadwood` and the resulting microclimate. Furthermore: the effects of `treatment` and `deadwood` also change slightly over time. — Ole Herbchandler, Nov 12 '20 at 19:20
Not to worry :) The main thing to avoid is interpreting the main effects if any of the others are mediator. Mediators should be excluded and you will need to fit a seperate model to assess the impact of the mediators seperatley. Check my answer [here](https://stats.stackexchange.com/questions/445578/how-do-dags-help-to-reduce-bias-in-causal-inference) for details of how to proceed with that. — Robert Long, Nov 12 '20 at 19:21
Hmm, regarding your 2nd comment, this is making me think that `treatment` and `deadwood` should be fixed effects,. — Robert Long, Nov 12 '20 at 19:21

How to build a GLMM that observes the years since experimental design was established?

1 Answers1