3

I am in a little bit of a mix right now regarding this scenario right here. Assuming I have a 2x3 design, looking at the effects of:

  1. Two mood-inducing conditions:

    • Happy
    • Neutral

    2.Three levels of task difficulty:

    • Easy
    • Moderate
    • Difficult

I wish to look at whether task performance will be affected by mood a certain mood induced. A total sample of 100 participants will be distributed 50-50 in factor 1. Factor 2 is within-subjects.

Factor 1 is a fixed factor here, since I am only specifically interested in two mood types. However, is factor 2 considered fixed right here since I am looking at three designated difficulty levels, or is factor 2 considered random, as difficulty is subjective, lying on a continuum? Or, is factor 2 considered random, because the 100 samples used are just random samples from the much bigger population of people on earth?

Ferdi
  • 4,882
  • 7
  • 42
  • 62

2 Answers2

2

I think the random/fixed effect terminology can be a little confusing: Andrew Gelman has blogged multiple times on this point. I believe that the best definition is that concerning pooling: as stated in a great answer on this site, random effects are estimated with partial pooling, while fixed effects are not.

In your case, use fixed effects for Factor 1 and 2, and add random effects for Factor 2, but only if if you have multiple observations with different values of Factor 2, for at least some of the 100 subjects. As a matter of fact, if you still had the same 100 subjects, but you didn't have multiple observation per subject, then you wouldn't pool estimates among different subjects.

The reason why you use random effects is that you expect different individuals to react differently to the same levels of Factor 1 and Factor 2. Now, estimating a different set of parameters $\hat{\boldsymbol{\beta}}_i=(\hat{\beta}_{0i},\hat{\beta}_{1i},\hat{\beta}_{2i},\hat{\beta}_{3i})$ for each subject $i$ isn't a good idea (this would be the no pooling model in the multilevel modeling terminology). This would be an exceedingly flexible model (too many parameters), and also an useless model, because how would you make predictions for a new individual not included in the 100 ones? Also, unless you have the same number of replications for each subject, estimates for individuals with few replicates will have higher variance than estimates for individuals with more replicates.

At the same time, you don't want to pool the data from all individuals together and just estimate 4 parameters $\hat{\boldsymbol{\beta}}=(\hat{\beta}_0,\hat{\beta}_1,\hat{\beta}_2, \hat{\beta}_3)$ (complete pooling model), because in this way you're neglecting the info that you had multiple measurements, with varying levels of Factor 2 (but not of Factor 1) for the each individual. What you can do is then to estimate fixed effects $\hat{\boldsymbol{\beta}}=(\hat{\beta}_0,\hat{\beta}_1,\hat{\beta}_2,\hat{\beta}_3)$ , plus a random intercept and two random coefficients for Factor 2 for each individual, which however you assume to come from a common multivariate normal distribution. This way you shrink the estimates for individuals with few replicates, towards the fixed effect estimates. This is the partial pooling or mixed effects model.

DeltaIV
  • 15,894
  • 4
  • 62
  • 104
  • I am not disagreeing with your answer, but +1 to Knarpie and I don't quite understand your criticism of their answer. "As a matter of fact, if you still had the same 100 subjects, but you didn't have multiple observation per subject, then you wouldn't pool estimates among different subjects." - that's correct but only because subject variance would be collinear with error variance, so it wouldn't make sense to add it to the model. – amoeba Dec 12 '17 at 15:47
  • @amoeba apart from the fact that I did premise "maybe I misunderstood Knarpie's answer", the fact that subject variance would be collinear etc. is the key point. I try to explain myself but this is really the kind of stuff that's better explained in words, or at worst in chat: when I read "You do need to a add a random effect for subject though[..]The subjects are drawn from a large population, and you want to do inference on this population, not on the 100 subjects you have randomly drawn" I tend to disagree, because all linear regression models are based on a random sample from a.... – DeltaIV Dec 12 '17 at 18:08
  • ...population (or at least they should be). So if that is sufficient reason to use a mixed model, that seems to me to imply that we should always use mixed models. I don't think so: for example, if I don't have repeated measurements, it doesn't make sense to add a random effect, for the reason you mention. I would frame the question differently: we don't use mixed models because we want to make inference on the population, since we *always* want to make inference on the population. We use them if, together with the group-level or population-level effect, we would like to take into account.. – DeltaIV Dec 12 '17 at 18:16
  • ...the uncertainty due to the fact that different individuals from the same population react differently to the same treatment. And we can do that with a mixed model, only if we have more than one measurement for at least some of the individuals. If Knarpie is reading this, I hope s/he doesn't feel bad: I'm not trying to bash the other answer. It could just be that I misunderstood it. – DeltaIV Dec 12 '17 at 18:16
  • @amoeba of course, one could use a completely different point of view from "my" partial pooling/no pooling approach (it's not mine: but I gave due credit). We could say instead that mixed models are latent variable models, and that, strictly speaking, random effects are not model parameters but unobserved random variables. This is surely more rigorous, but it would have made the answer harder to understand IMO. – DeltaIV Dec 12 '17 at 18:16
  • 1
    Thanks @DeltaIV, this is interesting. Let me think about what you wrote. – amoeba Dec 12 '17 at 19:09
  • 1
    I think it's mostly semantics. When @Knarpie says that one needs to use mixed models whenever "subjects are drawn from a large population", the word "subjects" is supposed to refer to levels of some factor with >1 observations per level. It's not supposed to refer to observations themselves. Consider 2-way ANOVA with 3-level factor Treatment, 10-level factor Age, and 5 observations per cell. That's not a mixed model. Now keep the data but change Age to Subject. Now suddenly it should be a mixed model. Why - because subjects are randomly drawn blabla. It's equivalent to what you are saying. – amoeba Dec 12 '17 at 21:36
  • @amoeba that makes sense, thus I've removed any reference to Knarpie's answer in my answer. I don't follow your ANOVA example though - when Age becomes Subject, are you in favor of a mixed model, or against it? The data haven't changed, but our interpretation of them has. If this has something to do with balanced designs and choosing between mixed models and RM-ANOVA, I don't know enough to follow you. – DeltaIV Dec 13 '17 at 12:18
  • 1
    When Age becomes Subject, I am in favor of mixed model (or RM-ANOVA, this distinction does not matter in this context). That's exactly the point: the data did not change, but the model we should be using did. – amoeba Dec 13 '17 at 12:31
  • @amoeba ah ok! Then we agree 100%. Do you like my new answer better? – DeltaIV Dec 13 '17 at 12:41
1

Irrespective of how you measure Factor 2, it should be included as a fixed effect. You are only interested in these three levels, not in the "population" of possible difficulties they were drawn from.

You do need to a add a random effect for subject though, to model the dependence between measurements on the same subject, albeit under different difficulties. The subjects are drawn from a large population, and you want to do inference on this population, not on the 100 subjects you have randomly drawn.

Knarpie
  • 1,522
  • 9
  • 22