Model selection for random effects: can unselected random effects be used as fixed effects?

Question

I am working on a mixed effects model. What I would consider random effects are:

year,
sampling transect,
sampling location.

There are multiple collections taken along each transect, and multiple transects were taken each year. The "full" random effect structure would be ~1|year/transect/collection. I have been taught that you can select the best random effects structure by comparing the AICc of competing models with different random effects structures (using REML rather than ML and using the full fixed effect structure). I ran the competing models, and in my case, the "best" random effect structure is ~1|collection. However, year is still likely an important variable in my analysis. Would it be bad form to add year to my fixed effect structure? It seems reasonable enough to me, but I'd like to know what is the proper thing to do in this scenario.

usεr11852 · Accepted Answer · 2016-01-04T11:42:41.803

2

The random effects part is there because you recognise there is some (possibly group-related) structure in your errors/residuals. The random effects are supposed to be based on the research question. Otherwise one simply cherry-picks an error structure trying to "squeeze more significance out of the remaining terms" (glmm-wiki).

Having said the above and more specifically for your case, I think that using likelihood-based methods (such as AICc) to compare two models with different fixed effects that are fitted by REML (not ML) will generally produce irrelevant results. Check Faraway's Extending the linear model with R for more details; I skim-read through Zuur et al.'s Mixed Effects Models and Extensions in Ecology with R and I am pretty sure I read a similar concept so I am somewhat surprised. Therefore I am not fully certain about what you mean about using the full fixed effect structure. If it is a common fixed effect structure then you might be on the clear about using REML but then again this brings us back to selecting random effects where as we said things can be iffy quickly. I would argue that using parametric bootstrap is the proper thing you can do at first instance at least.

For your final question: It would not be bad form to use year in your fixed effect. Nevertherless if the year|collection random-effects structure makes the most sense conceptually; use it. It reflects reasonable assumption, you control for a time-evolving trend everyone expects, end of the story. No $p$-value / likelihood-ratio test etc. is above your understanding of the problem at hand. You might want to comment for the reasons certain things appear statistically insignificant but that is another question. Check this excellent thread on "what is the upside of treating a factor as random in a mixed model?", I think it will aid your understanding further.

edited Jan 04 '16 at 11:42

answered Jan 04 '16 at 11:37

usεr11852

33,608
2
75
117

Thanks. To clarify, when I said "full fixed effects structure", I meant the full model with all covariates. The basic approach is to use REML when selecting the random effects, but then use ML once you have selected the random effects to determine the best model. I have seen this done in the literature. I chose this approach because I don't know what random effects structure makes the most sense. It could conceivably be the full ~1|year/transect/collection (year being a factor because I don't expect a time-evolving trend, but could expect yearly differences in the errors/residuals). – user14241 Jan 04 '16 at 17:51
CONTINUED: The ~1|transect seems less likely to be the proper random effects structure, and ~1|collection seems the most likely. I guess this gets to your point about random effects structure section being iffy. Luckily comparing the random effects structures yielded ~1|collection as the best random effects structure, so whether I base my structure on the research question or use the result of model selection there is no conflict. So I am feeling more confident about that. I will check out those links. Thanks. – user14241 Jan 04 '16 at 17:52
Cool, saw your edit. In any case: If you do not expect a time-evolving trend why would you expect yearly differences for a reason other than random fluctuations? As I said, random effects are usually related to the experimental design, maybe you should not try to select your random effects. Also using something like `1|year/...` implies you are treating time (in years) as a discrete variable; that is a bit dubious when it comes to time which is clearly continuous. – usεr11852 Jan 04 '16 at 17:59
I am glad I could help. – usεr11852 Jan 04 '16 at 18:01
Thanks. I did not recognize the problems of using 1|year with year being a factor. To give you some background, this is a fish feeding study. The response variable is stomach fullness. I guess my reasoning was that the sampling occurred over three years (springtime) in a patchy environment (year could also be considered 'cruise'). So I thought it was likely that different years may have different variances. – user14241 Jan 04 '16 at 18:32
An example of selection of random variables can be found here: http://dx.doi.org/10.1890/ES15-00256.1 – user14241 Jan 04 '16 at 18:34
Just speaking in general, time is almost always a fixed effect. For longitudinal data, random effects usually represent subjects, but there are better ways to model correlation patterns using serial correlation models/Markov models. – Frank Harrell Sep 26 '21 at 12:36

Model selection for random effects: can unselected random effects be used as fixed effects?

1 Answers1