19

There's a distinction that's tripping me up with mixed models, and I'm wondering if I could get some clarity on it. Let's assume you've got a mixed model of count data. There's a variable you know you want as a fixed effect (A) and another variable for time (T), grouped by say a "Site" variable.

As I understand it:

glmer(counts ~ A + T, data=data, family="Poisson") is a fixed effects model.

glmer(counts ~ (A + T | Site), data=data, family="Poisson") is a random effect model.

My question is when you have something like:

glmer(counts ~ A + T + (T | Site), data=data, family="Poisson") what is T? Is it a random effect? A fixed effect? What's actually being accomplished by putting T in both places?

When should something only appear in the random effects section of the model formula?

Fomite
  • 21,264
  • 10
  • 78
  • 137

3 Answers3

22

This may become clearer by writing out the model formula for each of these three models. Let $Y_{ij}$ be the observation for person $i$ in site $j$ in each model and define $A_{ij}, T_{ij}$ analogously to refer to the variables in your model.

glmer(counts ~ A + T, data=data, family="Poisson") is the model

$$ \log \big( E(Y_{ij}) \big) = \beta_0 + \beta_1 A_{ij} + \beta_2 T_{ij} $$

which is just an ordinary poisson regression model.

glmer(counts ~ (A + T|Site), data=data, family="Poisson") is the model

$$ \log \big( E(Y_{ij}) \big) = \alpha_0 + \eta_{j0} + \eta_{j1} A_{ij} + \eta_{j2} T_{ij} $$

where $\eta_{j} = (\eta_{j0}, \eta_{j1}, \eta_{j2}) \sim N(0, \Sigma)$ are random effects that are shared by each observation made by individuals from site $j$. These random effects are allowed to be freely correlated (i.e. no restrictions are made on $\Sigma$) in the model you specified. To impose independence, you have to place them inside different brackets, e.g. (A-1|Site) + (T-1|Site) + (1|Site) would do it. This model assumes that $\log \big( E(Y_{ij}) \big)$ is $\alpha_0$ for all sites but each site has a random offset ($\eta_{j0}$) and has a random linear relationship with both $A_{ij}, T_{ij}$.

glmer(counts ~ A + T + (T|Site), data=data, family="Poisson") is the model

$$ \log \big( E(Y_{ij}) \big) = (\theta_0 + \gamma_{j0}) + \theta_1 A_{ij} + (\theta_2 + \gamma_{j1}) T_{ij} $$

So now $\log \big( E(Y_{ij}) \big)$ has some "average" relationship with $A_{ij}, T_{ij}$, given by the fixed effects $\theta_0, \theta_1, \theta_2$ but that relationship is different for each site and those differences are captured by the random effects, $\gamma_{j0}, \gamma_{j1}, \gamma_{j2}$. That is, the baseline is random shifted and the slopes of the two variables are randomly shifted and everyone from the same site shares the same random shift.

what is T? Is it a random effect? A fixed effect? What's actually being accomplished by putting T in both places?

$T$ is one of your covariates. It is not a random effect - Site is a random effect. There is a fixed effect of $T$ that is different depending on the random effect conferred by Site - $\gamma_{j1}$ in the model above. What is accomplished by including this random effect is to allow for heterogeneity between sites in the relationship between $T$ and $\log \big( E(Y_{ij}) \big)$.

When should something only appear in the random effects section of the model formula?

This is a matter of what makes sense in the context of the application.

Regarding the intercept - you should keep the fixed intercept in there for a lot of reasons (see, e.g., here); re: the random intercept, $\gamma_{j0}$, this primarily acts to induce correlation between observations made at the same site. If it doesn't make sense for such correlation to exist, then the random effect should be excluded.

Regarding the random slopes, a model with only random slopes and no fixed slopes reflects a belief that, for each site, there is some relationship between $\log \big( E(Y_{ij}) \big)$ and your covariates for each site, but if you average those effects over all sites, then there is no relationship. For example, if you had a random slope in $T$ but no fixed slope, this would be like saying that time, on average, has no effect (e.g. no secular trends in the data) but each Site is heading in a random direction over time, which could make sense. Again, it depends on the application.

Note that you can fit the model with and without random effects to see if this is happening - you should see no effect in the fixed model but significant random effects in the subsequent model. I must caution you that decisions like this are often better made based on an understanding of the application rather than through model selection.

Macro
  • 40,561
  • 8
  • 143
  • 148
  • 2
    (+1): writing out the model formula for each model is indeed the best way to make R-notations more transparent; good job! – ocram Oct 01 '12 at 13:53
  • @Macro One question on the equations above (thanks for them btw) - do they also have the usual error term in them? If so, what's that term's subscript? – Fomite Oct 02 '12 at 08:55
  • 2
    Hi - one way to write a GLM is as a model for $E(Y_{ij}|X)$ (or a 'linked' version) as I've done here. There is no error term for the expected value, if the model is correctly specified. To answer your question, in GLMs we're specifying the _distribution_ of $Y_{ij}|X$. The "leftover" randomness in a linear model is manifested by a normally distributed error term. But, in non-linear GLMs (e.g. poisson, logistic) there is randomness "built in" since knowing the rate of a poisson or a success prob of a bernoulli trial doesn't allow you to predict a realization without error. Hope this helps. – Macro Oct 02 '12 at 14:34
11

You should note that T is none of your model's a random effects terms, but a fixed effect. Random effects are only those effects that appear after the | in a lmer formula!

A more thorough discussion of what this specification does you can find in this lmer faq question.

From this questions your model should give the following (for your fixed effect T):

  • A global slope
  • A random slopes term specifying the deviation from the overall slope for each level of Site
  • The correlation between the random slopes.

And as said by @mark999 this indeed is a common specification. In repeated measures designs, you generally want to have random slopes and correlations for all repeated measures (within-subjects) factors.

See the following paper for some examples (which I tend to always cite here):

Judd, C. M., Westfall, J., & Kenny, D. A. (2012). Treating stimuli as a random factor in social psychology: A new and comprehensive solution to a pervasive but largely ignored problem. Journal of Personality and Social Psychology, 103(1), 54–69. doi:10.1037/a0028347

Henrik
  • 13,314
  • 9
  • 63
  • 123
  • 2
    A similar reference from ecology: Schielzeth, Holger, and Wolfgang Forstmeier. 2009. “Conclusions Beyond Support: Overconfident Estimates in Mixed Models.” Behavioral Ecology 20 (2) (March 1): 416–420. doi:10.1093/beheco/arn145. http://beheco.oxfordjournals.org/content/20/2/416. – Ben Bolker Oct 01 '12 at 18:45
1

Something should appear only in the random part when you are not particularly interested in its parameter, per se, but need to include it to avoid dependent data. E.g., if children are nested in classes, you usually want children only as a random effect.

Peter Flom
  • 94,055
  • 35
  • 143
  • 276
  • But putting it down as both random and fixed makes no sense right? – Michael R. Chernick Oct 01 '12 at 10:42
  • Well, I wouldn't go so far as to say you *never* want to put them in the fixed effects too, but I think it would be very rare. You aren't really interested in how Mary did vs. how Joe did, you're just (as you know but some may not) using them as their own controls. – Peter Flom Oct 01 '12 at 10:45
  • 1
    Maybe I'm misunderstanding you, but I would have thought that having fixed and random effects for the same variable was more common than a variable having just a random effect. Having fixed and random effects for the same variable is not uncommon in the Pinheiro and Bates book. – mark999 Oct 01 '12 at 11:16
  • Hi @mark999 if you think I said that, then you are misunderstanding me. In my experience, in a given model, there is usually only 1 effect that is random and not fixed, sometimes there are none, sometimes (with complex nesting) 2. Certainly having something be fixed and random is common. That happens on nearly all these models. – Peter Flom Oct 01 '12 at 11:20
  • Yes Peter, I'm misunderstanding something. The way I read the previous comments, Michael asked whether having a fixed and random effect for the same variable makes no sense, and then you said that it would be very rare. – mark999 Oct 01 '12 at 11:25
  • Yes I thought that the OP specified the same variable as a fixed effect and a random effect in the same model. I don't know how that could be interpreted and it seems to make no sense. The variable has to be modelled as one or the other. – Michael R. Chernick Oct 01 '12 at 11:35
  • 2
    @MichaelChernick as I understand it, if you have a fixed effect and a random effect for the same variable, then the fixed effect is the overall effect in the population, while the random effect allows a different effect of the variable for each subject. There are several examples in Pinheiro & Bates. – mark999 Oct 01 '12 at 11:43
  • I think you must be misunderstanding or you are not being very clear. I see a variable can be treated either as fixed or random in the ways you describe bith not both ways in the same model??? – Michael R. Chernick Oct 01 '12 at 11:46
  • @MichaelChernick No, I mean the same variable having both a fixed effect and a random effect in the same mixed-effects model. There are several examples in Pinheiro and Bates. – mark999 Oct 01 '12 at 11:50
  • @mark999 I have the book by Pinheiro and Bates and I know Jose Pinheiro very well. A mixed effects model is called that because it contains some covariates that are fixed and some that are random but the same variable can't be both fixed and random. Maybe instead of continually referring to the book you could give me an example from the book that you are referring to. I think this is a misunderstanding of terminology. – Michael R. Chernick Oct 01 '12 at 12:23
  • Maybe the example in section 4.2.1 on pages 146-148? The way I read it, the model called `fm10rth.lme` treats age as both a fixed and a random effect. – smillig Oct 01 '12 at 12:40
  • 2
    @PeterFlom, re: "if children are nested in classes, you usually want children only as a random effect." I think you mean that class is the random effect. Unless there is further nesting in the data (e.g. repeated measurements on kids) then child level random effects are not identified. – Macro Oct 01 '12 at 13:17
  • 1
    @macro Yes, that's what I meant, sorry. The terminology gets very confusing! That may be why Gelman eschews the terms 'fixed' and 'random' – Peter Flom Oct 01 '12 at 16:45
  • @MichaelChernick Using SAS, there is an example on p 12 of [this article by Singer](http://www.gse.harvard.edu/~faculty/singer/Papers/Using%20Proc%20Mixed.pdf) where CSES is both in the `random` statement and the `model` statement. – Peter Flom Oct 01 '12 at 17:05
  • @PeterFlom Okay but how does that make CSES both a random effect and a fixed effect? – Michael R. Chernick Oct 01 '12 at 17:12
  • @Macro Nice to see you helping with this question. I don't understand how a variable can be both a fixed and a random effect. I think somehow my disagreement with Peter and the OP could be about terminology but if i am wrong can you help clear up my confusion. – Michael R. Chernick Oct 01 '12 at 17:14
  • @MichaelChernick Well, the model statement lists fixed effect and the random statement lists random effects. I think there's a lot of terminology confusion here, but I don't know how to clear it up. – Peter Flom Oct 01 '12 at 17:21
  • 2
    @Michael, I agree with you. In these kinds of hierarchical models, the random effects are defined by a grouping variable (as opposed to other multivariate models such as spatially indexed data sets, where the 'grouping' variable is continuously varying). In the OP's question, `Site` would be referred to as the random effect, not `T` or `A` or anything else. Thinking of it that way, `Site`'s effect clearly could not be both fixed and random, since the two wouldn't be identified from each other. You can have both fixed and random coefficients for a variable, but that's a different question. – Macro Oct 01 '12 at 17:23
  • @Macro Thank you! That is so well expressed. I know that you haven't been away that long but I have missed your presence. You may find that my behavior is improving and I am trying to adhere better to the site's policies. – Michael R. Chernick Oct 01 '12 at 17:44
  • Thanks @Michael and you're welcome. I'm glad to hear that and I can acknowledge that I'm not a angel - I've gone out of my way to needle you from time to time and that's not ok so I'm sorry about that (deleted my comments in the meta thread). See you around. – Macro Oct 02 '12 at 14:46
  • @Macro Keep up the good work! You didn't anger me often though and I was as guilty as you. Someone wrote (probably a now deleted comment) that they couldn't find your comment about taking a hiatus from CV. Another person commented back that it was in one of those Chernick-Macro arguments that always get deleted! – Michael R. Chernick Oct 02 '12 at 15:22