There is some demand for effect-size measures in doing post hoc analyses, and I am trying to decide whether to provide for this or not in an R package of mine. If I do, I would concentrate on Cohen's $d$-style measures, which in the context of comparing the $i$th and $j$th means in a simple one-way experiment is defined as $$ d_{ij} = \frac{\mu_i - \mu_j}{\sigma} $$ Here, $\mu_i$ and $\mu_j$ are the means, and $\sigma$ is the error standard deviation of the data, assumed common for all treatments. I can figure out how to estimate $d_{ij}$ from observed data and construct a confidence interval for it that takes into account the uncertainty of estimates of all three parameters.
However, my questions have to do with extending these ideas to split-plot experiments. There is some discussion of this here, but that focuses on pre-post testing rather than a more general case. For concreteness, I'll focus on Yates's classic oat-yield experiment (data available as Oats
in the nlme package in R). This experiment has six blocks (factor Block
); each block is subdivided into 3 plots and randomly assigned to the three levels of factor Variety
; and each plot is divided into four subplots and randomly assigned to the four levels of factor nitro
. I have questions about two ways that these data could be analyzed.
Homogeneous mixed model
I think this is the easy question... In this model, we assume that the Block
effects are iid $N(0,\sigma_B^2)$, the whole-plot effects (identify Block:Variety
) are iid $N(0,\sigma_P^2)$, and the residual (subplot) effects are iid $N(0,\sigma_E^2)$. An additive model with fixed effects for Variety
and nitro
fits pretty well.
What I surmise is that we can estimate Cohen's $d$ values for either Variety
comparisons or nitro
comparisons as the observed comparison, divided by an estimate of the total SD, $\sigma_T = \sqrt{\sigma_B^2 + \sigma_P^2 + \sigma_E^2}$. Is that correct? Or do people use some other $\sigma$ as the reference (other than perhaps modeling blocks and/or plots as fixed effects)?
Multivariate model
Another possibility is to model the four observations on each plot (corresponding to the 4 levels of nitro
) as a multivariate response variable. In that case, the Block
effects, if modeled as random, have a multivariate distribution, and so do the errors. The plot effects are subsumed in these multivariate distributions.
We can form meaningful comparisons of marginal means for Variety
and nitro
, as well as for combinations thereof, inasmuch as this model implicitly contains interaction effects.
But my question is, what, if anything, is a sensible reference value $\sigma$ for defining Cohen's $d$-style effect sizes? Each level of nitro
has its own error variance. I can see that for certain interaction comparisons (comparing Variety
with nitro
held fixed at one level), this is clear-cut; but it is not at all clear to me what, if anything, defines a Cohen's $d$ for comparing levels of nitro
, either marginally or at a fixed Variety
. Can anybody enlighten me?