Why does Restricted maximum likelihood yield a better (unbiased) estimate of the variance?

Question

I'm reading Doug Bates' theory paper on R's lme4 package to better understand the nitty-gritty of mixed models, and came across an intriguing result that I'd like to understand better, about using restricted maximum likelihood (REML) to estimate variance.

In section 3.3 on the REML criterion, he states that the use of REML in variance estimation is closely related to the use of a degrees of freedom correction when estimating variance from residual deviations in a fitted linear model. In particular, "although not usually derived this way", the degrees of freedom correction can be derived by estimating the variance through optimization of a "REML criterion" (Eq. (28)). The REML criterion is essentially just the likelihood, but the linear fit parameters have been eliminated by marginalizing (instead of setting them equal to the fit estimate, which would give the biased sample variance).

I did the math and verified the claimed result for a simple linear model with only fixed effects. What I'm struggling with is the interpretation. Is there some perspective from which it is natural to derive a variance estimate by optimizing a likelihood where the fit parameters have been marginalized out? It feels sort of Bayesian, as though I'm thinking of the likelihood as a posterior and marginalizing out the fit parameters as though they are random variables.

Or is the justification primarily just mathematical - it works in the linear case but is also generalizable?

score 4 · Accepted Answer · edited Apr 13 '17 at 12:44

The bias in the variance stemms from the fact that the mean has been estimated from the data and therefore the 'spread of that data around this estimated mean' (i.e. tha variance) is smaller than the spread of the data around the 'true' mean. See also : Intuitive explanation for dividing by $n-1$ when calculating standard deviation?

The fixed effets determine the model 'for the mean', therefore, if you can find a variance estimate that was derived without estimating the mean from the data (by 'marginalising out the fixed effects (i.e. the mean)') then this underestimation of the spread (i.e. variance) will be mitigated.

This is the 'intuitive' understanding why REML estimates eliminate the bias; you find an estimate for the variance without using the 'estimated mean'.

score 1 · Answer 2 · answered Jan 07 '16 at 06:09

1

Check out APPENDIX: THE REML ESTIMATION METHOD from within this SAS-related resource from author David Dickey.

"We can always find (n-1) numbers Z with known mean 0 and the same sum of squares and theoretical variance as the n Y values. This motivates the division of the Z sum of squares by the number of Zs, which is n-1."

When I was in grad school, REML was made out to be the best thing since sliced bread. From studying the lme4 package, I learned that it doesn't really generalize that well and maybe it isn't that important in the grand scheme of things.

answered Jan 07 '16 at 06:09

Ben Ogorek

4,629
1
21
41

Maybe not... an interesting bit of math and stats though. – Paul Jan 19 '16 at 16:51
1

I agree Paul. I think REML is a great example of elegant and creative problem solving in Statistics. It's definitely getting used in practice, and maybe that's all you can hope for in statistical research. – Ben Ogorek Jan 19 '16 at 22:28

Why does Restricted maximum likelihood yield a better (unbiased) estimate of the variance?

2 Answers2