1

I have the following data set:

                 |       Scenario 1       |     Scenario 2         |
                 |Trial 1|Trial 2| Trial 3|Trial 1|Trial 2| Trial 3|
 -------------------------------------------------------------------
              S1 | ...
 Condition 1  S2 | ...
              S3 |
 -------------------------------------------------------------------
              S5 |
 Condition 2  S6 |
              S7 |

Thus the Trials are nested in the Scenarios and all of them are within subject. I am trying to run an ANOVA on this data set. Here is the model without defining that Scenarios (and Trials) are within subject.

 my_data.aov <- aov(value~Condition*Trial%in%Scenario,data=my_data) #works fine

But when I specify that these are within subject:

my_data.aov <- aov(value~Condition*Trial%in%Scenario+Error(Player/(Trial%in%Scenario)),data=my_data) 

I get the following error

In aov(value ~ Condition * Trial % in % Scenario + Error(Player/(Trial %in%  :
Error() model is singular

The closest set-up I could find was Split plot in R but there the subjects are nested inside each Trial not in each Condition.

EXAMPLE FILE

Here is an example file in long format.

What about this approach?

If I treat each Trial as a sample, then I can collapse across Scenarios by averaging them, so I will have a simpler model, where each Subject's behavior is described per Scenario. And since I need to analyze the relationship of value~Condition*Scenario I can do so by defining the Error like Error(Subject/Scenario).

Will this approach invalidate my analysis?

Pio
  • 266
  • 2
  • 4
  • 16
  • What does the last sentence mean? Do all subjects get all the conditions? I'm afraid you don't have a Split-plot design and now you misspecified your model to have non-estimable parameters. – Horst Grünbusch May 12 '14 at 12:03
  • As the data description shows: `S1, S2, S3, ...` take only `Condition 1` and `S7, ...` only `Condition 2`. But I want to compare the effects of `Condition 1` with `Condition 2`. The reason why it fails (I think) is that each `subject` has the `Scenario`, which nests the `Trial`, which I could not clearly integrate in the model. – Pio May 12 '14 at 12:05
  • Now I see. Try ``Error(Player/(Trial*Subject))``. The nesting is irrelevant to specify this error term. You have 3*2 measures for each subject and want to integrate all the possible covariances between these measures into your model. To this end, you don't need to specify that these covariances may be decomposable into one part from the trial and one from the scenario. The covariance would be the same if scenario and trial would be crossed. In fact this is the part that makes your covariance parameters non estimable. – Horst Grünbusch May 12 '14 at 12:41
  • You meant `Error(Player/(Trial*Scenario))`? – Pio May 12 '14 at 12:45
  • Ah, yes, of course, sorry! Does it work? – Horst Grünbusch May 12 '14 at 12:47
  • I get the same error. – Pio May 12 '14 at 13:05

2 Answers2

5

Without an example dataset (say from 6 subjects) it is hard to say for certain. You might try to use dput towards this end.

The error message Error() model is singular suggests to me that you have insufficient degrees of freedom to calculate your model. I think it is probably unlikely you meant to use %in% in this context, but maybe I'm wrong. Look at ?%in%. %in% produces a logical TRUE/FALSE vector of whether the Trial # matches the Scenario #. When you do estimate that subject effect then you only have two observations in the cell where Trial # matches the Scenario number and then you won't have enough observations per subject left over to also look at a Condition * Trial %in% Scenario interaction.

Consider:

library(doBy)
summaryBy(value~Condition*Trial%in%Scenario,data=mydata,FUN=c(mean,var,length))

... and be sure it matches your expectations.

russellpierce
  • 17,079
  • 16
  • 67
  • 98
  • I am getting `NA` for `mean` and `var`. – Pio May 12 '14 at 15:12
  • var I expected to be NA because from a single variable you can't calculate variance. Do you have missing values in your dataset? `summaryBy(value~Condition*Trial%in%Scenario,data=mydata,FUN=function(x) {any(is.na(x))})` – russellpierce May 12 '14 at 15:37
  • You would expect that the result would say "TRUE" if you are missing any values in that cell. – russellpierce May 12 '14 at 15:56
  • Yes, I have missing values. – Pio May 12 '14 at 17:17
  • 1
    Repeated measures ANOVA can not handle missing values. – russellpierce May 12 '14 at 17:29
  • @Pio: I'm sorry this isn't the outcome you were hoping for. I hope you'll mark my answer as accepted all the same. Alternatively, if there are points I could clarify - please let me know. – russellpierce May 12 '14 at 18:45
  • Repeated measures ANOVA with missing values is possible for missing values, even as available cases analysis without need for imputations. It's just not yet published. – Horst Grünbusch May 12 '14 at 22:38
  • @Horst, such a procedure would not be repeated measures ANOVA. Mixed models can fit models with missing data (and that has been published and peer reviewed). However those are not repeated measures ANOVA. I think your comment only serves to cloud the issue. – russellpierce May 13 '14 at 05:11
  • @rpierce what you're saying is there is no model that could incorporate all the aspects of this experiment the way I described? – Pio May 13 '14 at 10:01
  • Well as @HorstGrünbusch indicated, and I acknowledged, there are other types of model that could potentially fit your data. However, there is no proper model specification that you can fit to your data with your selected variables and the aov function without dropping players MlpV0YuEXKZrU5Wg0VGlWYyi, jsp75Kqov11ZgWn3B5WjglWa, BWtRG1272KYjC8N0oE51ScrH, uYJHoWZJLrnCKDWWzoo4UEGL, and LCawAtJXipsGxSkTOHlRD7v5. – russellpierce May 13 '14 at 11:37
  • You only have 6 or 7 players (independent replications), right? You may consider to drop some trials. Are these random effects anyway? Or you can use SAS proc mixed which is quite versatile with repeated measures, but then it's no more pure ANOVA as @rpierce pointed out. – Horst Grünbusch May 13 '14 at 14:03
  • @HorstGrünbusch: With respect, I recommend caution. Mixed models are more complex than SAS lets on; this can lead users without mixed model experience down unfortunate paths. Although a mixed model will likely 'work' here, I want it clearly specified that PROC MIXED is not a panacea for repeated measures data with missing observations. – russellpierce May 13 '14 at 14:43
1

I found a solution, even though it solves a my bigger problem, but as you could see from my question I went down a road, where I got stuck and I didn't really get an answer how to solve it. So I opted for the lme4 package and used the lmer function to model the relationship in my data.

BTW, the my approach where I suggest collapsing the data is a traditional approach in psychology, which I found out recently (after asking the question).

Finally the tutorial which saved the day for me (using lmer) is written by Bodo Winter, where he works on a dataset that almost matches mine -- even though it's not so obvious from the first time you read it.

In short my linear model looks like this:

 lmer(value~Condition*Scenario + (1+Scenario|Player) + (1|Scenario/Trial)

This perfectly models my experimental setup.

Pio
  • 266
  • 2
  • 4
  • 16
  • 1
    I'm perplexed as to why you selected this answer to a question that asks "what can cause". I answered what can cause. What can one do instead is a different question. Collapsing the data is traditional in psychology (and many other fields) because it yields data you can run an ANOVA on. The approach you decided on is the mixed model approach that Horst and I mentioned above (lmer is the R version of PROC MIXED). I suspect your next question might be "where are the p-values". Rather than get into why that is a bad idea, here is a link: http://cran.r-project.org/web/packages/afex/index.html – russellpierce May 20 '14 at 13:18