12

I am trying to calculate the random effect predictions from a linear mixed model by hand, and using notation provided by Wood in Generalized Additive Models: an introduction with R (pg 294 / pg 307 of pdf), I am getting confused over what each parameters represents.

Below is a summary from Wood.

Define a linear mixed model

$$ Y = X\beta + Zb + \epsilon $$

where b $\sim$ N(0, $\psi$), and $\epsilon \sim$ N(0, $\sigma^{2}$)

If b and y are random variables with joint normal distribution

\begin{align*} \begin{bmatrix} b\\ y \end{bmatrix} &\sim N \begin{bmatrix} \begin{bmatrix} 0\\ X\beta \end{bmatrix}\!\!,& \begin{bmatrix} \psi & \Sigma_{by} \\ \Sigma_{yb}& \Sigma_{\theta}\sigma^{2} \end{bmatrix} \end{bmatrix}\\ \end{align*}

The RE predictions are calculated by

\begin{align*} E[b \mid y] &= \Sigma_{by} \Sigma_{yy}^{-1} (y - x\beta)\\ &= \Sigma_{by}\Sigma_{\theta}^{-1}(y - x \beta) / \sigma^{2}\\ &= \psi z^{T}\Sigma_{\theta}^{-1} (y - x \beta) / \sigma^{2} \end{align*}

where $\Sigma_{\theta} = Z\psi Z^{T} /\sigma^{2} + I_{n}$

Using an random intercept model example from lme4 R package I get output

library(lme4)
m = lmer(angle ~ temp + (1 | replicate), data=cake)
summary(m)

% Linear mixed model fit by REML ['lmerMod']
% Formula: angle ~ temp + (1 | replicate)
%    Data: cake
% 
% REML criterion at convergence: 1671.7
% 
% Scaled residuals: 
%      Min       1Q   Median       3Q      Max 
% -2.83605 -0.56741 -0.02306  0.54519  2.95841 
% 
% Random effects:
%  Groups    Name        Variance Std.Dev.
%  replicate (Intercept) 39.19    6.260   
%  Residual              23.51    4.849   
% Number of obs: 270, groups:  replicate, 15
% 
% Fixed effects:
%             Estimate Std. Error t value
% (Intercept)  0.51587    3.82650   0.135
% temp         0.15803    0.01728   9.146
% 
% Correlation of Fixed Effects:
%      (Intr)
% temp -0.903

So from this, I think $\psi$ = 23.51, $(y-X\beta)$ can be estimated from cake$angle - predict(m, re.form=NA), and sigma from the square of the population level residuals.

th = 23.51
zt = getME(m, "Zt") 
res = cake$angle - predict(m, re.form=NA)
sig = sum(res^2) / (length(res)-1)

Multiplying these together gives

th * zt %*% res / sig
         [,1]
1  103.524878
2   94.532914
3   33.934892
4    8.131864
---

which is not correct when compared to

> ranef(m)
$replicate
   (Intercept)
1   14.2365633
2   13.0000038
3    4.6666680
4    1.1182799
---

Why?

amoeba
  • 93,463
  • 28
  • 275
  • 317
user2957945
  • 738
  • 1
  • 7
  • 19

1 Answers1

9

Two problems (I confess it took me like 40 minutes to spot the second one):

  1. You must not compute $\sigma^2$ with the square of residuals, it is estimated by REML as 23.51, and there is no guarantee that the BLUPs will have the same variance.

    sig <- 23.51
    

    And this is not $\psi$ ! Which is estimated as 39.19

    psi <- 39.19
    
  2. The residuals are not obtained with cake$angle - predict(m, re.form=NA) but with residuals(m).

Putting it together:

> psi/sig * zt %*% residuals(m)
15 x 1 Matrix of class "dgeMatrix"
         [,1]
1  14.2388572
2  13.0020985
3   4.6674200
4   1.1184601
5   0.2581062
6  -3.2908537
7  -4.6351567
8  -4.5813846
9  -4.6351567
10 -3.1833095
11 -2.1616392
12 -1.1399689
13 -0.2258429
14 -4.0974355
15 -5.3341942

which is similar to ranef(m).

I really don't get what predict computes.


PS. To answer your last remark, the point is that we use the "residuals" $\hat\epsilon$ as a way to obtain the vector $PY$ where $P = V^{-1} - V^{-1} X \left(X' V^{-1} X\right)^{-1} X' V^{-1}$. This matrix is computed during the REML algorithm. It is related to the BLUPs of random terms by $$ \hat\epsilon = \sigma^2 PY $$ and $$ \hat b = \psi Z^t PY. $$

Thus $ \hat b = \psi / \sigma^2 Z^t \hat\epsilon $.

amoeba
  • 93,463
  • 28
  • 275
  • 317
Elvis
  • 11,870
  • 36
  • 56
  • 1
    Thanks Elvis. I am struggling a wee bit to align the values you have used back to the equations above, however, it seems there are many ways to skin a cat. The residuals I find a bit surprising as I thought it is meant to be $y-x\beta$, (fixed effect) whereas residuals is calculated using the random effects. (see the difference between `plot(residuals(m), cake$angle-predict(m, re.form=NULL)) ; plot(residuals(m), cake$angle-predict(m, re.form=NA))`). – user2957945 Oct 27 '16 at 19:16
  • 1
    A way using the fixed effect, and the third version of the E[b|y] above: `z = getME(m, "Z") ; big_sig = solve(((z * psi) %*% zt ) / sig + diag(270)) ; psi/sig * zt %*% big_sig %*% (cake$angle-predict(m, re.form=NA))`. Thanks for pointing out the correct items. – user2957945 Oct 27 '16 at 19:16
  • One final Q if I may, can I get either of $\Sigma_{by}$ or $\Sigma_{yy}$ directly from the output? – user2957945 Oct 27 '16 at 19:19
  • Isn’t $\Sigma_{yb}$ equal to $\psi Z$ ? – Elvis Oct 27 '16 at 19:54
  • Elvis, I had another wee think on this (I know I'm slow). I think using the residuals like this isn't really sensible as it uses the predicted values (and so residuals) at the RE level to calculate, so we are using it at both sides of your equation. (so it uses the RE predictions (E[b|y]) to make the predictions of residuals even though these are the terms we are trying to predict)) – user2957945 Oct 28 '16 at 07:50
  • @user2957945 I added a PS to explain why it is legit to do so. I hope you will be convinced. – Elvis Oct 28 '16 at 10:45
  • Thanks very much again for your works Elvis, greatly appreciated. I'll go and try to understand. cheers – user2957945 Oct 28 '16 at 10:56
  • Perhaps I'm missing something, but I don't see how this answer works for more than two random effects -- if the dimension of $\psi$ is $2\times 2$, then it isn't conformable with $Z^T$. – generic_user Dec 26 '17 at 20:03
  • $\psi$ is a scalar, the variance of $b$ is $\psi I_q$ where $q$ is the dimension of $b$. Look at the R code in the answer, it should be helpful. – Elvis Dec 26 '17 at 20:59
  • Right, in the example here there is only one random intercept. But if you had more than one random term then $\psi$ wouldn't be a scalar. But since posting this comment, I've figured out that $\psi$ in the general case is block diagonal, with each block being the covariance matrix of the random effects. – generic_user Dec 26 '17 at 21:14
  • $\psi$ is a scalar, period. – Elvis Dec 26 '17 at 21:37
  • If that is true, then I am confused. What happens in a model with more than one random effect, and correlated random effects in particular? – generic_user Dec 26 '17 at 21:42
  • Or, more precisely, it isn't $\psi$, the covariance of the random effects that gets multiplied by $Z^T$, but rather a block-diagonal matrix with $\psi$ making up the blocks... Unless I'm wrong -- these models are complicated and poorly documented. – generic_user Dec 26 '17 at 21:46
  • In that case, you would for example put $var(b) = \psi K$. – Elvis Dec 26 '17 at 21:54