1

My question: when performing a two-way ANOVA (type III sum of squares), what is a valid approach to handling cells with zero observations?

More details:

I would like to apply a two-way ANOVA, followed by a Tukey post-hoc test, to statistically assess outputs from 2x4 factorial-design experiment. The experimental details are as follows:

Time point      Treatment  Number of observations
     0             Con               8
     0             Expo              0
     1             Con               8
     1             Expo              7
     2             Con               8
     2             Expo              8
     3             Con               8  
     3             Expo              8

Due to having unbalanced data and the expectation of interaction between factors, I believe it's appropriate for me to apply Type III sum of squares, such that unweighted marginal means are calculated. I am attempting to process this data in R using the 'Anova()' function, from the 'car' package.

Here is a minimum 'working' example:

#Data frame demonstrating study design
df = data.frame('Treatment' = c(rep('CON', 8), rep(c(rep('CON', 8),rep('EXPO', 8)), 3)), 
  'TimePoint' = c(rep('TP0', 8), rep('TP1', 16), rep('TP2', 16), rep('TP3', 16)))

#Generate mock dependent variable ('dv') values
set.seed(2)
df$dv = rep(rnorm(nrow(df)))

#install/load 'car' package
#install.packages('car')
library(car)

model = Anova(lm(testData[,ncol(testData)] ~ TimePoint * Treatment, 
             data=testData, 
             contrasts=list(TimePoint=contr.sum, Treatment=contr.sum)), type=3)

The above code generates an error:

Error in Anova.III.lm(mod, error, singular.ok = singular.ok, ...) : 
there are aliased coefficients in the model

I believe this relates specifically to the 0 observations at Time point 0, Treatment 'EXPO'.

The above data is from a typical toxicological study, in which time point 0 is the 'reference' group. As such, it was not justifiable to use an additional 8 organisms simply to balance the design and provide additional 'baseline' information.

Here are options I have considered:

  1. Fill the empty Time point 0-Treatment 'Expo' (empty) cells with data duplicated from Time point 0-Treatment 'Con'. On the one-hand, this seems reasonable in that this the 'start' point from which the toxicological study was started, and the data from Time point 0-Treatment 'Con' is an entirely appropriate set against which to compare all other factor-level combinations. At the same time, duplicating data for this purpose doesn't feel scientifically sensible/valid/defensible. I would very much appreciate CV member input on this.

  2. Split the Time point 0-Treatment 'Con' data in half, assigning 50% to Time point 0-Treatment 'Con' and the other half to Time point 0-Treament 'Expo'. Again, I am not sure this is sensible, given that the way in which the data is split could affect the final statistical outputs.

  3. I've considered the findings from another StackEx. post Missing cells with Type III SS. This unfortunately only deals with an additive model, whereas I wish to use an interaction-based model.

I look forward to reading your suggestions.

MRJ.

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
MRJ
  • 53
  • 4

0 Answers0