1

I am encountering an issue using the aov() function in R. I have a two factor randomized block model and would like to perform anova following the model presented in the book "Analysis of variance and covariance" (P. Doncaster and A. Davey). They have the following example :

4.2/6.2/7.2 Two-factor randomized complete block model Y = S'|B|A

aovdata <- read.table("http://www.southampton.ac.uk/~cpd/anovas/datasets/R/Model4_2.txt", header = T)
 attach(aovdata)
 S <- factor(S) ; A <- factor(A) ; B <- factor(B)

Commands for Model-1 factorial analysis

 model4_2i <- aov(Y ~ S + A*B + Error(S + S:A + S:B))
 summary(model4_2i)

(available on http://www.southampton.ac.uk/~cpd/anovas/datasets/ANOVA%20in%20R.htm#model4_2)

This renders the following table :

> summary(model4_2i)

Error: S
  Df Sum Sq Mean Sq
S  3  9.075   3.025

Error: S:A
          Df Sum Sq Mean Sq F value   Pr(>F)    
A          2  745.4   372.7   67.58 7.68e-05 ***
Residuals  6   33.1     5.5                     
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Error: S:B
          Df Sum Sq Mean Sq F value Pr(>F)
B          1  91.65   91.65   4.134  0.135
Residuals  3  66.51   22.17               

Error: Within
          Df Sum Sq Mean Sq F value Pr(>F)  
A:B        2 186.37   93.18   7.819 0.0213 *
Residuals  6  71.51   11.92                 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

My issue is that when I try to use this same model with my own data, I don't have the expected structure :

model <- aov(Y ~ S + A*B + Error(S + S:A + S:B),data)
summary(model)

I get :

   > summary(model)

Error: S
  Df  Sum Sq Mean Sq
S  3 0.05215 0.01738

Error: S:A
          Df  Sum Sq Mean Sq F value Pr(>F)  
A          3 0.18511 0.06170   5.286 0.0266 *
B          1 0.01334 0.01334   1.143 0.3163  
Residuals  8 0.09339 0.01167                 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Error: S:B
          Df   Sum Sq  Mean Sq F value Pr(>F)  
B          1 0.025230 0.025230  19.162 0.0484 *
A:B        1 0.003853 0.003853   2.927 0.2293  
Residuals  2 0.002633 0.001317                 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Error: Within
          Df  Sum Sq  Mean Sq F value Pr(>F)
A:B        3 0.01858 0.006194   1.164  0.382
Residuals  8 0.04255 0.005319

I don't understand why factor B is in the "Error S:A" term, or why the A:B interaction is in the "Error B:S". I don't know what I do differently or where I go wrong. Any ideas?

Here is the dataframe :

> data
   B A S    Y
1  1 1 1 2.64
2  1 1 2 2.60
3  1 1 3 2.54
4  1 1 4 2.56
5  2 1 1 2.63
6  2 1 2 2.54
7  2 1 3 2.62
8  2 1 4 2.57
9  1 3 1 2.74
10 1 3 2 2.70
11 1 3 3 2.81
12 1 3 4 2.67
13 2 3 1 2.59
14 2 3 2 2.62
15 2 3 3 2.80
16 2 3 4 2.58
17 1 2 1 2.63
18 1 2 2 2.54
19 1 2 3 2.73
20 1 2 4 2.76
21 2 2 1 2.48
22 2 2 2 2.68
23 2 2 3 2.57
24 2 2 4 2.73
25 1 4 2 2.50
26 1 4 3 2.75
27 1 4 4 2.44
28 2 4 1 2.34
29 2 4 2 2.34
30 2 4 3 2.54
31 2 4 4 2.45
kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
user299955
  • 13
  • 2

1 Answers1

2

With only 31 observations in your data set, you don't have balance. As the aov manual page says:

aov is designed for balanced designs, and the results can be hard to interpret without balance...

Your data set is specifically missing a 32nd case for which B=1, A=4, and S=1. If you provide such a case along with a corresponding value of Y, restoring balance, you will get the type of summary report that you expect. If there isn't such a case, then you can't properly use aov for the analysis.

In this case, the unexpected form of the summary report has prevented you from using aov inappropriately with unbalanced data. It's good that you paid enough attention to the form of the summary to catch that problem.

EdM
  • 57,766
  • 7
  • 66
  • 187
  • I see, thank you! May I ask if you have any suggestions of analyses I could do with this unbalanced data set? – user299955 Oct 16 '20 at 09:30
  • @user299955 you could try the analysis with the `lm` interface instead of `aov`. A big problem is that your model will probably be terribly overfit. With 4 levels in each of A and S and the A:B interaction term, you need to estimate 10 coefficients beyond the intercept with only 31 observations total. You generally need about 10 observations per coefficient you are estimating. And with the imbalance you will still have to consider which [type of ANOVA](https://stats.stackexchange.com/q/20452/28500) is best for your study to evaluate the `lm` results. – EdM Oct 16 '20 at 13:32
  • Ok, thanks a lot for the help! – user299955 Oct 16 '20 at 18:01