Split plot in R

Question

I have a data set of $n$ benchmarks and $m$ subsamples in each benchmark. I run these benchmarks and their subsamples on $p$ subject machines. The 'individual' studied by the subsamples are the same for each subject machine, and the benchmarks are the same for each subject machine.

How do I carry out an ANOVA in R in this situation?

Mainly I want to compute total mean and confidence intervals. I don't care about sub sample means at all, but I want to recognise the replication there in the final confidence and means. I may care about benchmark means though. I can't work out how to setup this anova in R. I want to be able to replicate the means by manual calculation.

I have tried glm, anova, aov, and lme but I'm totally confused. I think ANOVA results should be equivalent for two subject machines to the nested mean of machine/benchmark/checkpoint, but the means don't come out the same when I try them.

Edit:

I'm starting to get a clue from http://zoonek2.free.fr/UNIX/48_R/13.html

this question has been asked several times on this list. [This](http://www.stat.ufl.edu/~casella/StatDesign/WebRPrograms/Diet.R) is the short answer. I write a detailed answer later in the day. — suncoolsu, Aug 03 '11 at 09:13
Thanks for your quick answer. I'm sorry for not finding the other answers, but I'm not clued in enough to the terminology to know how to ask the question yet. — Alex Brown, Aug 04 '11 at 06:00

score 14 · Answer 1 · answered Aug 04 '11 at 07:23

The major difference between split plot design and other designs such as completely randomized design and variations of block designs is the nesting structure of subjects, that is, when the observations are from obtained from the same subject (experimental unit) more than once. This leads to a correlation structure within a subject in split plot design which is different from correlation structure in a block.

Let's take an example picture of data set from a simple split-plot design (below). This is a study of dietary composition on health, four diets were randomly assigned to 12 subjects, all of similar health status. Baseline blood pressure was established, and one measure of health was blood pressure change after two weeks. Blood pressure was measured in the morning and the evening. (The example is copied from Casella's Statistical Design book example 5.1)

$$ \begin{array}{r|ccccc|l} ~ & \text{Diet} 1 & \text{Diet} 2 & \text{Diet}3 & \text{Diet}4 \\\hline ~ & \text{Subject} & \text{Subject} & \text{Subject} &\text{Subject}\\ ~ & 1 \, 2 \, 3 & 4 \, 5 \, 6 & 7 \, 8 \, 9 & 10 \, 11 \, 12\\\hline \text{Morning} & x \, x \, x & x \, x \, x & x \, x \, x & x \, x \, x\\ \text{Evening} & x \, x \, x & x \, x \, x & x \, x \, x & x \, x \, x\\ \hline \end{array} $$

A few important things to note:

There are 12 experimental units (12 subjects)
On these 12 units we observe 24 data points ( $2 \times 4 \times 3$), denoted by $x$
This is so because we take two observations on the same subject, first in the morning and second in the evening
This means that the two observations on a subject are from the same experimental unit. Therefore, the this is not true replication. Because the observations are taken from the same subject in the course of time, there must be some correlation between the two observations.
Note that this is different from a two way ANOVA with Diet and Time as the factors.
A two way ANOVA will have observations like this:

$$ \begin{array}{r|ccccc|l} ~ & \text{Diet} 1 & \text{Diet} 2 & \text{Diet}3 & \text{Diet}4 \\\hline \text{Morning} & x \, x \, x & x \, x \, x & x \, x \, x & x \, x \, x\\ \text{Evening} & x \, x \, x & x \, x \, x & x \, x \, x & x \, x \, x\\ \hline \end{array} $$

each of the $x$s here are different subjects. This illustrates the concept of nesting. That is, subjects 1, 2, 3 are nested in Diet 1. - The whole plots, the experimental units at the whole plot (Diet) level (the Subjects) act as blocks for the split plot treatment (Morning- Evening)

The model for this split plot design is:

$$ Y_{ijk} = \mu + \tau_i + S_{ij} + \gamma_{k} + (\tau \gamma)_{ik} + \epsilon_{ijk}, $$ where $$ Y_{ijk} = \text{the response to diet i of subject j at time k,} $$ $$ \tau_i = \text{diet i effect} $$ $$ S_{ij} = \text{subject j's effect in diet i (whole plot error)} $$ $$ (\tau \gamma)_{ik} = \text{the interaction of diet i and time j} $$ $$ \epsilon_{ijk} = \text{split plot error} $$ Once you have the model well-formulated, writing in R aov form is trivial:

splitPltMdl <- aov(bloodPressure ~ Diet + ## Diet effect 
                                   Error(Subject/Diet) + ## nesting of Subject in Diet 
                                   Time*Diet, ## interaction of Time and Diet 
                                   data = dietData)

+1 nice answer. If you now could explain me how you do some post-hoc tests or planned comparisons (e.g., is there a dfifference between morning and evening in the groups 1&2, pooled) you would answer me a lot of questions. See also my question on R-help: http://article.gmane.org/gmane.comp.lang.r.general/237681 — Henrik, Aug 12 '11 at 13:47
I am a bit busy at the moment. I will definitely get back to u. — suncoolsu, Aug 12 '11 at 20:52
Why don't you include the Time effect in your model formula? Why do you include `Diet` and `Time*Diet` in the `aov` call? It should be `Time:Diet` to match to your mathematical formula. — amoeba, Aug 29 '16 at 12:21

Split plot in R

1 Answers1

Linked