1

I'm using glm to predict species richness from different combinations of environmental variables. Species data were collected at 40 to 125 sites in summer over two or three years, depending on the region. These sites were the same across years.

For each of the four regions, I'd like to include year in the model. However, I'm not sure if a fixed or random effect is most appropriate. I understand the philosophical argument that a random effect is appropriate for broad inference across years, whereas a fixed effect is appropriate when the particular years in question are of interest. In this case, I'd specify year as a random intercept: lme4::glmer(richness ~ conifer + shrub + herb + (1|year), data). But with only two to three levels within year, perhaps this is not suitable.

Another consideration is that I am using stepwise model selection to select covariates at different spatial scales. However, glmmLasso, which I attempted for the process, returned the following error: 'Error in n %*% s : requires numeric/complex matrix/vector arguments'. I suspect that this may be due to there only being two to three levels within year.

Guidance on whether year should be fixed or a random intercept (or included with an alternative approach) is much appreciated. Thank you!

1 Answers1

2

With only three levels you should not add year as a random effect. Reason for this is covered in this Post
What is the minimum recommended number of groups for a random effects factor?

One point of particular relevance to 'modern' mixed model estimation (rather than 'classical' method-of-moments estimation) is that, for practical purposes, there must be a reasonable number of random-effects levels (e.g. blocks) — more than 5 or 6 at a minimum.

So if you want to add year to your model it would suggest to add it as a fixed effect:
lme4::glmer(richness ~ conifer + shrub + herb + year, data)
and maybe if it makes sense in your experiment check for interactions with year:
lme4::glmer(richness ~ (conifer + shrub + herb) * year, data)