7

The data

Suppose we have a dataset d with two between-subject factors (i.e., groups), group and condition, and two within-subject factors (i.e., repeated-measures factors), topic and problem (I uploaded the data to pastebin, so everybody should be able to obtain it):

> d <- read.table(url("http://pastebin.com/raw.php?i=4hRFyaRj"), colClasses = c(rep("factor", 6), "numeric"))
> str(d)
'data.frame':   2928 obs. of  6 variables:
  $ code     : Factor w/ 183 levels "A03U","A08C",..: 1 1 1 1 1 1 1 1 1 1 ...
  $ group    : Factor w/ 2 levels "control","experimental": 2 2 2 2 2 2 2 2 2 2 ...
  $ condition: Factor w/ 3 levels "alternatives",..: 3 3 3 3 3 3 3 3 3 3 ...
  $ topic    : Factor w/ 4 levels "1","2","3","4": 1 1 1 1 2 2 2 2 3 3 ...
  $ problem  : Factor w/ 4 levels "AC","DA","MP",..: 3 4 1 2 3 4 1 2 3 4 ...
  $ mean     : num  94.5 94.5 86.5 84.5 80 46.5 73.5 43.5 51 39 ...

The data is from a behavioral experiment in which participants in six groups (2 levels of group times 3 levels of condition) worked on 16 tasks (for each of 4 topics 4 different problems). Allocation of participants to group/condition was fully random. Presentation of tasks was random insofar that problem was blocked within topic (i.e., for each topic all problems where presented sequentially), but order of problem and topic was random.
Update: The factor identifying the participant (in which topic and problem are nested) is code.

The Problem

How can I fit this dataset using lme?
(Sidenote: I would also consider using lme4, but I am kind of afraid of not having p-values, if there is something easily digestible as p-values, I would also consider lme4 an option).

So far I managed to fit an lme model with only one within-subject factor, but not two (see below).

What I tried

I can fit an lme model if I have just one within-subject factor:

require(nlme)
 m1 <- lme(mean ~ condition*group*problem, random = ~1|code/problem, 
           data = d, subset = topic == "1")

anova(m1)
                        numDF denDF F-value p-value
(Intercept)                 1   531   12101  <.0001
condition                   2   177      31  <.0001
group                       1   177       2  0.2178
problem                     3   531      35  <.0001
condition:group             2   177       1  0.3672
condition:problem           6   531      24  <.0001
group:problem               3   531       1  0.2180
condition:group:problem     6   531       2  0.0281

This (especially the df) nicely correspond with the results from an standard ANOVA (using ez):

require(ez)
ezANOVA(subset(d, topic == "1"), dv = .(mean), wid = .(code), between = .(condition, group), within = .(problem))$ANOVA

Warning: Data is unbalanced (unequal N per group). Make sure you specified a well-considered value for the type argument to ezANOVA().
                   Effect DFn DFd     F                             p p<.05     ges
2               condition   2 177 30.69 0.000000000003611248905859672     * 0.13079
3                   group   1 177  1.53 0.217821969825403999321267179       0.00374
5                 problem   3 531 34.85 0.000000000000000000014254103     * 0.10028
4         condition:group   2 177  1.01 0.367225806638525886782531416       0.00492
6       condition:problem   6 531 24.40 0.000000000000000000000000142     * 0.13503
7           group:problem   3 531  1.48 0.217959293081550348203379031       0.00472
8 condition:group:problem   6 531  2.38 0.028119961573665430004664856     * 0.01499

Trying to fit this data with two within-subject factors in lme fails (either per code, or per dfs):

m2 <- lme(mean ~ condition*group*problem*topic, random = ~1|code/(problem*topic), data = d)
# fails: Error in getGroups.data.frame(dataMix, groups) : 
#  Invalid formula for groups

m3 <- lme(mean ~ condition*group*problem*topic, random = ~1|code/problem/topic, data = d)
# the next model takes some time (probably already an indicator, that it is the wrong model)
# and produces wrong denominator df!

# with both factors as ANOVA
m4 <- ezANOVA(d, dv = .(mean), wid = .(code), between = .(condition, group), within = .(problem, topic))

#effects are the same
all(row.names(anova(m3))[-1] == m4$ANOVA$Effect)

#denominator dfs are not:
anova(m3)$denDF[-1] == m4$ANOVA$DFd

# only for effects with topic:
row.names(anova(m3))[-1][!(anova(m3)$denDF[-1] == m4$ANOVA$DFd)]

UPDATE: As the precise error or nesting is somewhat unclear I here provide the equivalent aov call (this is the "standard" model via aov), which matches the results from ezANOVA. The critical error term is Error(code/(problem*topic)):

m5 <- aov(mean ~ (condition*group*problem*topic) + Error(code/(problem*topic)), d)
summary(m5)
Henrik
  • 13,314
  • 9
  • 63
  • 123
  • Hi there, could you describe more about how group, condition, topic, and problem are related to each other in the design? – Michelle Feb 28 '12 at 21:32
  • Ohh, I am sorry I omitted this. It is a fully randomized experimental design. `group` and `condition` are two fully crossed between-subject factors (i.e., each cell consists of a unique set of participants). `topic` and `problem` are two fully crossed within-subject factors: Each participant responded to four different problems (denoted as `MP`, `MT`, ...) for each of the four different topics (1 to 4), in total each participant responded to 16 problems. The design is fully randomized (allocation to condition and order of topic and problems). – Henrik Feb 28 '12 at 21:39
  • @Michelle see also my update. – Henrik Feb 28 '12 at 22:07
  • Hi again, so `problem` is nested in `topic` and `group` and `condition` are not involved in any nesting? – Michelle Feb 28 '12 at 22:08
  • @Michelle As far as I understand nested, I think `problem` and `topic` are both nested within `code` which is the variable identifying the participant. `group` and `condition` are not nested within `code`. – Henrik Feb 28 '12 at 22:13
  • Are the p-values from `lme` actually reasonable? I don't know much about the topic, but `lme4` does several things different (like not giving p-values and not having a default `forecast` method) for what appear to be good reason. – Wayne Feb 28 '12 at 22:28
  • @Wayne Totally. They are pretty much the same as the p-values from the ANOVA (which is the usual way to analyze this data in Psychology). – Henrik Feb 28 '12 at 22:33
  • There is a function called pvals.fnc in the languageR package that allows you to get p values (calculated using Markov chain Monte Carlo sampling) for lme4 models. Perhaps you could try that? – Marius Feb 29 '12 at 00:39
  • Also, I'm more familiar with lme4's syntax than nlme, but does "random = ~1|code/problem/topic" mean you're trying to get random slopes for each subject for each of the within subjects variables? The model might be too complex. I would have just started with random intercepts for each subject. – Marius Feb 29 '12 at 00:42
  • @Marius - that's quite an involved hierarchy, so I agree with your comment about it being a complicated model. – Michelle Feb 29 '12 at 01:04
  • @Marius See my updates and my answer about this problem. – Henrik Feb 29 '12 at 10:57

1 Answers1

1

I found an answer to my question on this thread: Repeated measures ANOVA with lme in R for two within-subject factors (somehow this thread was already one of my favorites, I must have forgotten about it). The specification is a little unhandy.

m6 <- lme(mean ~ condition*group*problem*topic, 
   random = list(code=pdBlocked(list(~1, pdIdent(~problem-1), pdIdent(~topic-1)))), data = d)
anova(m6)

However, the denominator dfs are still wrong, as noted in the thread and apparent in comparisons between the ANOVA and lme dfs.

data.frame(effect = rownames(anova(m6)), denDf= anova(m6)$denDF)

m4$ANOVA[,c("Effect", "DFd")]

As long as there are no other ideas, I think I will need to do the analysis in lme4, for which I wil need to post another question.

Henrik
  • 13,314
  • 9
  • 63
  • 123