8

I have 2 x 2 x 2 mixed design with two between subject factors (sex, organizer) and one within subjects factor (task). The group sizes of the 'sex'-factor is unequal. When I perform a factorial repeated measures ANOVA, I get the following warning message:

Warning: Data is unbalanced (unequal N per group). Make sure you specified a well-considered value for the type argument to ezANOVA().

I used the following model in R:

model <- ezANOVA(data=df, dv=.(top_start), wid=.(id), between=.(sex, org),
                         within=.(task), type = 3, detailed = TRUE)

I used type = 3 for the Anova because, as I understood, it is suited for unbalanced group sizes.

I have the following questions:

  • Do I need to code contrasts when all my factors have only two levels?
  • Did I use the right type of Anova?
  • Are there other ways to do this analysis?
Jaap
  • 846
  • 9
  • 19

2 Answers2

9

If you use type 3 for ANOVAs it is critical in R that you set the contrast to effect coding (i.e., "contr.sum").

The default contrast in R is dummy coding (or in R parlance, treatment coding) in which 0 represents the first factor level. This doesn't make too much sense when having interactions as explaind on the page I linked to.

To set effect coding, run the following:

options(contrasts=c("contr.sum","contr.poly"))

Alternatively, you can use the afex package, which has similar goals as ez, with the difference that it automatically sets the contrasts to effects coding and uses type 3 as default.

Henrik
  • 13,314
  • 9
  • 63
  • 123
  • Thanx for you answer! However my first question still stands. In all the examples I've seen so far contrast coding is used with 3 or more levels. All my factors have only **two** levels. Do I still need contrast coding (effect coding) in that case? – Jaap Dec 06 '13 at 08:59
  • 1
    Yes, doesn't make a difference in this case. The difference in intercept and all the problems are the same for two or more groups. The interpretation of the 0 is the issue. And independent of the number of groups, with treatment coding it is the first group whereas for effect coding it is the unweighted grand mean. This affects all higher order effects. And now give me my upvote... – Henrik Dec 06 '13 at 09:42
6

I'm certainly no ANOVA expert but I guess the other way to do this analysis is to switch to a regression framework and use lme4 which doesn't mind unbalanced data and will itself work out what it 'between' and what is 'within'. I believe the relevant line for an additive model would be

mod0 <- lmer(top_start ~ (1 | id) + task + org + sex, data=df)

where you could add interactions/asterisks as appropriate.

Henrik
  • 13,314
  • 9
  • 63
  • 123
conjugateprior
  • 19,431
  • 1
  • 55
  • 83
  • Yeah, but then you should probably add `task` as a random slope (i.e., `(task|id)`), **if** it has replicates (which it then should have). – Henrik Dec 06 '13 at 10:14
  • @Henrik could you explain a bit more what you mean with _random slope_ and _replicates_? Do you mean _repeated measures_ with _replicates_? – Jaap Dec 06 '13 at 10:38
  • @conjungateprior when looking for the **lme4** package, I also found the **nlme** package. What is the difference between tose two? Why should I use **lme4** instead of **nlme**? – Jaap Dec 06 '13 at 10:58
  • 1
    @Jaap : Check the following thread on the matter http://stats.stackexchange.com/questions/5344/how-to-choose-nlme-or-lme4-r-library-for-mixed-effects-models/ – usεr11852 Dec 06 '13 at 12:14
  • 2
    @Jaap a 'random slope' would mean that the direction / slope, as well as the magnitude of an effect could vary by person. This sort of consideration takes off from where the discussion at http://conjugateprior.org/2013/01/formulae-in-r-anova/ stops. – conjugateprior Dec 06 '13 at 22:54