I'm reading Doug Bates' theory paper on R's lme4 package to better understand the nitty-gritty of mixed models, and came across an intriguing result that I'd like to understand better, about using restricted maximum likelihood (REML) to estimate variance.
In section 3.3 on the REML criterion, he states that the use of REML in variance estimation is closely related to the use of a degrees of freedom correction when estimating variance from residual deviations in a fitted linear model. In particular, "although not usually derived this way", the degrees of freedom correction can be derived by estimating the variance through optimization of a "REML criterion" (Eq. (28)). The REML criterion is essentially just the likelihood, but the linear fit parameters have been eliminated by marginalizing (instead of setting them equal to the fit estimate, which would give the biased sample variance).
I did the math and verified the claimed result for a simple linear model with only fixed effects. What I'm struggling with is the interpretation. Is there some perspective from which it is natural to derive a variance estimate by optimizing a likelihood where the fit parameters have been marginalized out? It feels sort of Bayesian, as though I'm thinking of the likelihood as a posterior and marginalizing out the fit parameters as though they are random variables.
Or is the justification primarily just mathematical - it works in the linear case but is also generalizable?