11

A particular section of the mgcv documentation gives multiple methods of incorporating random effects into a generalized additive model. Two methods are 1) to add a smooth term in the class labels using bs="re" in gam; 2) Use the function gamm, which includes similar facilities to lme, combined with the existing functions for gam. However, on simulated data, the two give pretty different model fits. Why is that and which one should be used?

x <- rnorm(1000) 
ID <- rep(1:200,each=5)
y <- x 
for(i in 1:200) y[which(ID==i)] <- y[which(ID==i)] + rnorm(1)
y <- y + rnorm(1000)
ID <- as.factor(ID)
m1 <- gam(y ~ x + s(ID,bs="re"))
m2 <- gamm(y ~ x, random=list(ID=~1) )
mean( (fitted(m1)-fitted(m2$gam))^2 ) 
linksys
  • 283
  • 1
  • 2
  • 7

1 Answers1

14

I suspect the difference is in terms of what fitted values you are getting. If you look at what I would call the model fit, the coefficient estimates, variance terms, the models are identical. Compare summary(m2$lme) with summary(m1) and gam.vcomp(m1).

> summary(m1)

Family: gaussian 
Link function: identity 

Formula:
y ~ x + s(ID, bs = "re")

Parametric coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  0.05234    0.07932    0.66     0.51    
x            1.01375    0.03535   28.68   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Approximate significance of smooth terms:
        edf Ref.df     F p-value    
s(ID) 167.1    199 5.243  <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

R-sq.(adj) =  0.674   Deviance explained = 72.9%
GCV = 1.2133  Scale est. = 1.0082    n = 1000
> summary(m2$lme)
Linear mixed-effects model fit by maximum likelihood
 Data: strip.offset(mf) 
       AIC     BIC    logLik
  3218.329 3237.96 -1605.165

Random effects:
 Formula: ~1 | ID
        (Intercept) Residual
StdDev:    1.025306 1.003452

Fixed effects: y.0 ~ X - 1 
                 Value  Std.Error  DF   t-value p-value
X(Intercept) 0.0523358 0.07922717 799  0.660578  0.5091
Xx           1.0137531 0.03535887 799 28.670404  0.0000
 Correlation: 
   X(Int)
Xx 0.014 

Standardized Within-Group Residuals:
        Min          Q1         Med          Q3         Max 
-2.80375873 -0.67702485  0.04245145  0.64026891  2.59257295 

Number of Observations: 1000
Number of Groups: 200 

We see that the estimates for $\hat{\beta}_x$ are 1.01375 and $\hat{\beta}_0$ ~0.05 in both models. Note also the estimate for the between subject/group variance (as a standard deviation), $\hat{\sigma}_{\mathrm{ID}}$ is given as 1.025 in the output from summary(m2$lme). The same information can be computed from the gam model using gam.vcomp(), which gives for m1

> gam.vcomp(m1)
   s(ID) 
1.027795

which is near enough a match for us to not worry about it.

Hence the fitted methods must be returning different fitted values; if we generate fitted values from m2$lme, then we get the same values as that produced by fitted(m1):

> mean((fitted(m1)-fitted(m2$lme))^2) 
[1] 2.966927e-07

which is for all intents and purposes 0.

fitted.lme is documented to return contributions from the the population level (averaging over ID) and for the subject-specific components. This is what fitted.gam will be doing for m1 because it represents the random effect as a spline "fixed" effect. In the case of the gamm model, fitted.gam is returning fitted values for the "fixed" effect part of the model, which would explain the difference. (I'm writing "fixed" because with splines the terms "fixed" and "random" effects get a little blurred.)

In his book, Simon Wood mentions this issue in the example fit to the Sole data using gamm(). He talks about using resid() and fitted() on the $lme component of the gamm model as excluding and including the random effects, respectively. He says this is appropriate for model diagnostics in the specific example here is using.

Which you need will depend on the context of your specific usage/research question.

If all you need are simple random effects like this and you are familiar with GAMs and mgcv then it might be simpler all round to just use the random effect spline basis with gam() rather than having to deal with the weird output of the hybrid that is a GAMM model fitted via gamm(). As I've shown above, the two models are effectively equivalent and the difference you report is just down to whether the fitted values include or exclude the subject (or ID) specific effects.

Gavin Simpson
  • 37,567
  • 5
  • 110
  • 153
  • Thank you Gavin. So does this mean that the `$gam` refers to a totally different model from the `$lme`? Somehow I thought these were just different representations of the same fit. Apparently that is not true. The target of my inference is the gam, not the lme model. – linksys Feb 22 '16 at 18:51
  • This line in the documentation is particularly concerning: `gamm assumes that you know what you are doing! For example, unlike glmmPQL from MASS it will return the complete lme object from the working model at convergence of the PQL iteration, including the 'log likelihood', even though this is not the likelihood of the fitted GAMM.` So, I cannot use the `lme` likelihoods to make inference about the `gam` (e.g. comparing two sub-models using the LRT)? – linksys Feb 22 '16 at 18:52
  • From what I recall, the `$gam` is a version of the model, but from what I can see the `$gam` component doesn't include the random effects part in the fitted values. – Gavin Simpson Feb 22 '16 at 18:53
  • 1
    Regarding your edit, I could use `gam()` but my concern is how slow it is, which is a substantial problem in my real data application (about 800 unique IDs, about 5-6 observations per ID). I don't care about the random effects, per se, I'm just trying to do my due dilligence. – linksys Feb 22 '16 at 18:55
  • 1
    @linksys That quote about PQL only applies to non-Gaussian fits; are you fitting a non-Gaussian model? If you are, I'd probably look elsewhere, eg. the **gamm4** package (also by Simon Wood), for this *if* you need fancy random effects. If you don't need fancy random effects, `gam()` should work just fine - it'll get bogged down if you have complex/nested random effects or lots of them as it isn't programmed to exploit the sparseness of the model matrix of the random effects. – Gavin Simpson Feb 22 '16 at 18:55
  • 1
    In my real data application, I do have a binary outcome. I was just using the linear case for an example, because the same thing happens. I do not prefer `gamm4` because you cannot include `te()` terms and my primary hypotheses involve testing for interaction using models like `te(x1) + te(x2) + te(x1,x2)` which, if I understand correctly, requires you to use `te` and not `s`. – linksys Feb 22 '16 at 18:58
  • Then don't use PQL - it's is biased when you have large SD especially in binomial data with low observations per subject/ID. You can use tensor products via `t2()` terms (read `?gamm4`), which uses new research by Simon Wood, Fabian Scheipl and Julian Faraway (paper in 2013). But how slow is slow? It might not be worth learning a new package and model representation (gamm4 uses lme4 which is quite different under the hood to `lme()` in nlme). – Gavin Simpson Feb 22 '16 at 19:07
  • "Slow" is at least 15 minutes to fit (stopped it before it finished). I am familiar with lme4, so the syntax is not a concern. If I understand you correctly, then `t2` can be used equivalently to `ti` or `te`, which alleviates my entire concern with using `gamm4`. Also, using `gam` does produce odd inconsistencies in terms of the number of parameters in the model, which concerns (that's the subject of another question I posted). So, within `gamm4`, I can compare models using the `logLik` of the `lme4` piece of the model? – linksys Feb 22 '16 at 19:13
  • 1
    `t2` is equivalent to `te` - I don't think there is an equivalent of `ti` that can be used in `gamm4()`, but you could try and see as I've not used `gamm4` much. You might be hitting memory issues (see if your RAM usage maxes out when you fit using `gam()`). But all in all sounds like `gamm4()` is the way to go. I'm not sure what you mean about number of parameters but it you mean the large EDF associated with the ranef spline, isn't this just a reflection of the problem of how may DFs does a random effect represent, 1 (for simple RE slopes) or `n`, number of subjects/IDs, or ? – Gavin Simpson Feb 22 '16 at 19:20
  • I don't explicitly need separable effects, e.g. if it suffices to just compare the model with `s(x1) + s(x2)` vs the model with `s(x1,x2)`, I'm fine with that-- that is valid, yes? Re: your other query, I'm not sure, Gavin, but when I include things like `s(ID,bs="re")`, this eats up hundreds of degrees of freedom whereas in `lmer`, the random effects only eats up 1 DF. Also, the likelihoods are completely different. The two model fits are identical). By the way, thank you very much for your attention. This has all been great. – linksys Feb 22 '16 at 19:26
  • 1
    So the 1DF for the variance term in GLMMs is one of those open questions; yes you are estimating a single parameter in this case hence the 1 extra degree of freedom. But you are also using information on the separate subjects, so we might also reasonably suggest N-1 DFs for a random effect term. This isn't a done deal in statistical theory yet, and the `gam()` models shows why; the ranef spline is a spline and it has N-1 DFs (199) just as if you'd fitted another complex spline to the data. Which is right? 1 or N-1?? See box 3 in http://dx.doi.org/10.1016/j.tree.2008.10.008 for an summary. – Gavin Simpson Feb 22 '16 at 19:39
  • As for the likelihoods being different, I suspect this is just the use of different normalising constants in the likelihood calculation; you see this a lot between *implementations* of similar models. I would use `s(x1) + s(x2) + t2(x1, x2)` and `s(x1) + s(x2)` as my two models to compare; you might need to dig into some of the detail though; see `?gam.models` which has recently been updated to use on `ti()` instead of nested `s()` and `te` terms (as I used above), but it shouldn't be a problem for the model you are fitting. – Gavin Simpson Feb 22 '16 at 19:43
  • 2
    The more I think about it, the more I am leaning toward just using `gam()` and incorporating random effects the simple way, and eating the computer time. I was initially against it because I have several models to fit, so this would be several hours at the very least, but I am OK with that. If I can, I'm going to drop all my rep on a bounty for this exchange, Gavin. You've been tremendously helpful. – linksys Feb 22 '16 at 19:47