How to compare two groups with multiple measurements for each individual with R?

Question

I have a problem like the following:

1) There are six measurements for each individual with large within-subject variance

2) There are two groups (Treatment and Control)

3) Each group consists of 5 individuals

4) I want to perform a significance test comparing the two groups to know if the group means are different from one another.

The data looks like this:

And I have run some simulations using this code which does t tests to compare the group means. The group means were calculated by taking the means of the individual means. This ignores within-subject variability:

 n.simulations<-10000
    pvals=matrix(nrow=n.simulations,ncol=1)
    for(k in 1:n.simulations){
      subject=NULL
      for(i in 1:10){
        subject<-rbind(subject,as.matrix(rep(i,6)))
      }
      #set.seed(42)

      #Sample Subject Means
      subject.means<-rnorm(10,100,2)

      #Sample Individual Measurements
      values=NULL
      for(sm in subject.means){
        values<-rbind(values,as.matrix(rnorm(6,sm,20)))
      }

      out<-cbind(subject,values)

      #Split into GroupA and GroupB
      GroupA<-out[1:30,]
      GroupB<-out[31:60,]

      #Add effect size to GroupA
      GroupA[,2]<-GroupA[,2]+0

      colnames(GroupA)<-c("Subject", "Value")
      colnames(GroupB)<-c("Subject", "Value")

      #Calculate Individual Means and SDS
      GroupA.summary=matrix(nrow=length(unique(GroupA[,1])), ncol=2)
      for(i in 1:length(unique(GroupA[,1]))){
        GroupA.summary[i,1]<-mean(GroupA[which(GroupA[,1]==unique(GroupA[,1])[i]),2])
        GroupA.summary[i,2]<-sd(GroupA[which(GroupA[,1]==unique(GroupA[,1])[i]),2])
      }
      colnames(GroupA.summary)<-c("Mean","SD")


      GroupB.summary=matrix(nrow=length(unique(GroupB[,1])), ncol=2)
      for(i in 1:length(unique(GroupB[,1]))){
        GroupB.summary[i,1]<-mean(GroupB[which(GroupB[,1]==unique(GroupB[,1])[i]),2])
        GroupB.summary[i,2]<-sd(GroupB[which(GroupB[,1]==unique(GroupB[,1])[i]),2])
      }
      colnames(GroupB.summary)<-c("Mean","SD")

      Summary<-rbind(cbind(1,GroupA.summary),cbind(2,GroupB.summary))
      colnames(Summary)[1]<-"Group"

      pvals[k]<-t.test(GroupA.summary[,1],GroupB.summary[,1], var.equal=T)$p.value
    }

And here is code for plots:

#Plots
par(mfrow=c(2,2))
boxplot(GroupA[,2]~GroupA[,1], col="Red", main="Group A", 
        ylim=c(.9*min(out[,2]),1.1*max(out[,2])),
        xlab="Subject", ylab="Value")
stripchart(GroupA[,2]~GroupA[,1], vert=T, pch=16, add=T)
#abline(h=mean(GroupA[,2]), lty=2, lwd=3)

for(i in 1:length(unique(GroupA[,1]))){
  m<-mean(GroupA[which(GroupA[,1]==unique(GroupA[,1])[i]),2])
  ci<-t.test(GroupA[which(GroupA[,1]==unique(GroupA[,1])[i]),2])$conf.int[1:2]

  points(i-.2,m, pch=15,cex=1.5, col="Grey")
  segments(i-.2,
           ci[1],i-.2,
           ci[2], lwd=4, col="Grey"
  )
}
legend("topleft", legend=c("Individual Means +/- 95% CI"), bty="n", pch=15, lwd=3, col="Grey")


boxplot(GroupB[,2]~GroupB[,1], col="Light Blue", main="Group B", 
        ylim=c(.9*min(out[,2]),1.1*max(out[,2])),
        xlab="Subject", ylab="Value")
stripchart(GroupB[,2]~GroupB[,1], vert=T, pch=16, add=T)
#abline(h=mean(GroupB[,2]), lty=2, lwd=3)

for(i in 1:length(unique(GroupB[,1]))){
  m<-mean(GroupB[which(GroupB[,1]==unique(GroupB[,1])[i]),2])
  ci<-t.test(GroupB[which(GroupB[,1]==unique(GroupB[,1])[i]),2])$conf.int[1:2]

  points(i-.2,m, pch=15,cex=1.5, col="Grey")
  segments(i-.2,
           ci[1],i-.2,
           ci[2], lwd=4, col="Grey"
  )
}
legend("topleft", legend=c("Individual Means +/- 95% CI"), bty="n", pch=15, lwd=3, col="Grey")


boxplot(Summary[,2]~Summary[,1], col=c("Red","Light Blue"), xlab="Group", ylab="Average Value",
        ylim=c(.9*min(Summary[,2]),1.1*max(Summary[,2])),
        main="Individual Averages")
stripchart(Summary[,2]~Summary[,1], vert=T, pch=16, add=T)

points(.9, mean(GroupA.summary[,1]), pch=15,cex=1.5, col="Grey")
segments(.9,
         t.test(GroupA.summary[,1])$conf.int[1],.9,
         t.test(GroupA.summary[,1])$conf.int[2], lwd=4, col="Grey"
)

points(1.9, mean(GroupB.summary[,1]), pch=15,cex=1.5, col="Grey")
segments(1.9,
         t.test(GroupB.summary[,1])$conf.int[1],1.9,
         t.test(GroupB.summary[,1])$conf.int[2], lwd=4, col="Grey"
)
legend("topleft", legend=c("Group Means +/- 95% CI"), bty="n", pch=15, lwd=3, col="Grey")


hist(pvals, breaks=seq(0,1,by=.05), col="Grey",
     main=c(paste("# sims=", n.simulations),
            paste("% Sig p-values=",100*length(which(pvals<0.05))/length(pvals)))
)

Now, it seems to me that because each individual mean is an estimate itself, that we should be less certain about the group means than shown by the 95% confidence intervals indicated by the bottom-left panel in the figure above. Thus the p-values calculated are underestimating the true variability and should lead to increased false-positives if we wish to extrapolate to future data.

So what is the correct way to analyze this data?

Bonus:

The example above is a simplification. For the actual data:

1) The within-subject variance is positively correlated with the mean.

2) Values can only be multiples of two.

3) The individual results are not roughly normally distributed. They suffer from zero floor effect, and have long tails at the positive end.

4) Number of Subjects in each group are not necessarily equal.

Previous literature has used the t-test ignoring within-subject variability and other nuances as was done for the simulations above. Are these results reliable? If I can extract some means and standard errors from the figures how would I calculate the "correct" p-values.

EDIT:

Ok, here is what actual data looks like. There is also three groups rather than two:

enter image description here

dput() of data:

structure(c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 
3, 3, 3, 3, 3, 3, 3, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 3, 
3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 
6, 7, 7, 7, 7, 7, 7, 8, 8, 8, 8, 8, 8, 9, 9, 9, 9, 9, 9, 10, 
10, 10, 10, 10, 10, 11, 11, 11, 11, 11, 11, 12, 12, 12, 12, 12, 
12, 13, 13, 13, 13, 13, 13, 14, 14, 14, 14, 14, 14, 15, 15, 15, 
15, 15, 15, 16, 16, 16, 16, 16, 16, 17, 17, 17, 17, 17, 17, 18, 
18, 18, 18, 18, 18, 2, 0, 16, 2, 16, 2, 8, 10, 8, 6, 4, 4, 8, 
22, 12, 24, 16, 8, 24, 22, 6, 10, 10, 14, 8, 18, 8, 14, 8, 20, 
6, 16, 6, 6, 16, 4, 2, 14, 12, 10, 4, 10, 10, 8, 4, 10, 16, 16, 
2, 8, 4, 0, 0, 2, 16, 10, 16, 12, 14, 12, 8, 10, 12, 8, 14, 8, 
12, 20, 8, 14, 2, 4, 8, 16, 10, 14, 8, 14, 12, 8, 14, 4, 8, 8, 
10, 4, 8, 20, 8, 12, 12, 22, 14, 12, 26, 32, 22, 10, 16, 26, 
20, 12, 16, 20, 18, 8, 10, 26), .Dim = c(108L, 3L), .Dimnames = list(
    NULL, c("Group", "Subject", "Value")))

EDIT 2:

In response to Henrik's answer: So if I instead perform anova followed by TukeyHSD procedure on the individual averages as shown below, I could interpret this as underestimating my p-value by about 3-4x?

My goal with this part of the question is to understand how I, as a reader of a journal article, can better interpret previous results given their choice of analysis method. For example they have those "stars of authority" showing me 0.01>p>.001. So if i accept 0.05 as a reasonable cutoff I should accept their interpretation? The only additional information is mean and SEM.

#Get Invidual Means
summary=NULL
for(i in unique(dat[,2])){
sub<-which(dat[,2]==i)
summary<-rbind(summary,cbind(
dat[sub,1][3],
dat[sub,2][4],
mean(dat[sub,3]),
sd(dat[sub,3])
)
)
}
colnames(summary)<-c("Group","Subject","Mean","SD")

TukeyHSD(aov(summary[,3]~as.factor(summary[,1])+ (1|summary[,2])))

#      Tukey multiple comparisons of means
#        95% family-wise confidence level
#    
#    Fit: aov(formula = summary[, 3] ~ as.factor(summary[, 1]) + (1 | summary[, 2]))
#    
#    $`as.factor(summary[, 1])`
#             diff       lwr       upr     p adj
#    2-1 -0.672619 -4.943205  3.597967 0.9124024
#    3-1  7.507937  1.813822 13.202051 0.0098935
#    3-2  8.180556  2.594226 13.766885 0.0046312

EDIT 3: I think we are getting close to my understanding. Here is the simulation described in the comments to @Stephane:

#Get Subject Means
means<-aggregate(Value~Group+Subject, data=dat, FUN=mean)

#Initialize "dat2" dataframe
dat2<-dat

#Initialize within-Subject sd
s<-.001
pvals=matrix(nrow=10000,ncol=2)

for(j in 1:10000){
#Sample individual measurements for each subject
temp=NULL
for(i in 1:nrow(means)){
temp<-c(temp,rnorm(6,means[i,3], s))
}

#Set new values
dat2[,3]<-temp

#Take means of sampled values and fit to model
dd2 <- aggregate(Value~Group+Subject, data=dat2, FUN=mean)
fit2 <- lm(Value~Group, data=dd2)

#Save sd and pvalue
pvals[j,]<-cbind(s,anova(fit2)[[5]][5])

#Update sd
s<-s+.001
}

plot(pvals[,1],pvals[,2], xlab="Within-Subject SD", ylab="P-value")

enter image description here

@Henrik. I don't have the simulation data used to generate that figure any longer. Do you want an example of the simulation result or the actual data? Also, is there some advantage to using dput() rather than simply posting a table? — Flask, Oct 10 '13 at 15:59
@Flask I am interested in the actual data. And the `dput()` of the data makes it the easiest to read it into R. — Henrik, Oct 10 '13 at 17:31
Difference between which two groups actually interests you (given the original question, I expect you are only interested in two groups)? — Henrik, Oct 10 '13 at 17:39
@Henrik. I am interested in all comparisons. The example of two groups was just a simplification. As noted in the question I am not interested only in this specific data. There is data in publications that was generated via the same process that I would like to judge the reliability of given they performed t-tests. Actually, that is also a simplification. What has actually been done previously varies including two-way anova, one-way anova followed by newman-keuls, "SAS glm". I am most interested in the accuracy of the newman-keuls method. — Flask, Oct 10 '13 at 17:55
Have you tried to fit a Gaussian model with a data transformation ? Such as `gls(f(Value) ~ Group, data=dat, na.action=na.omit, correlation=corSymm(form= ~ 1 | Subject), weights=varIdent(form = ~1 | Group))` (with `nlme` package) — Stéphane Laurent, Oct 10 '13 at 18:41
@Stéphane. I have done this using the sqrt transformation as suggested by Henrik, but do not understand the output. — Flask, Oct 10 '13 at 20:14
Taken from `?aov`: "`aov` is designed for balanced designs". So your second edit doesn't provide a reasonable model I guess. — Henrik, Oct 10 '13 at 20:39
Flask, see also the comments below @Henrik's answer. The advantage of `gls` is that one can specifiy different variances per group with the `weights` argument. But one inconvenient is that Kenwards-Rogers degrees of freedom provided by the `pbkrtest` package are not available for `gls` models. — Stéphane Laurent, Oct 10 '13 at 20:44
@StéphaneLaurent If you want variance weights and nested structure (i.e., replicates per participants) you can simply use `lme`instead of `gls`. Or what sepaks against `lme`? — Henrik, Oct 10 '13 at 20:48
I have nothing against `lme`. I'm just trying with `gls`. Considering the subject as a random effect is equivalent to consider an exchangeable correlation structure. — Stéphane Laurent, Oct 10 '13 at 20:56
@StéphaneLaurent I see, makes sense. And the residuals are relatively similar to the ones reported below. But only if you use `lme` they are really identical. — Henrik, Oct 10 '13 at 21:06
As said in my second answer, taking the subject means is a correct approach. The group means are sufficient statistics if you don't want to estimate variance components. — Stéphane Laurent, Oct 10 '13 at 22:19

score 7 · Accepted Answer · answered Oct 10 '13 at 18:54

I take the freedom to answer the question in the title, how would I analyze this data.

Given that we have replicates within the samples, mixed models immediately come to mind, which should estimate the variability within each individual and control for it.

Hence I fit the model using lmer from lme4. However, as we are interested in p-values, I use mixed from afex which obtains those via pbkrtest (i.e., Kenward-Rogers approximation for degrees-of-freedom). (afex also already sets the contrast to contr.sum which I would use in such a case anyway)

To control for the zero floor effect (i.e., positive skew), I fit two alternative versions transforming the dependent variable either with sqrt for mild skew and log for stronger skew.

require(afex)

# read the dput() in as dat <- ...    
dat <- as.data.frame(dat)
dat$Group <- factor(dat$Group)
dat$Subject <- factor(dat$Subject)

(model <- mixed(Value ~ Group + (1|Subject), dat))
##        Effect    stat ndf ddf F.scaling p.value
## 1 (Intercept) 237.730   1  15         1  0.0000
## 2       Group   7.749   2  15         1  0.0049

(model.s <- mixed(sqrt(Value) ~ Group + (1|Subject), dat))
##        Effect    stat ndf ddf F.scaling p.value
## 1 (Intercept) 418.293   1  15         1  0.0000
## 2       Group   4.121   2  15         1  0.0375

(model.l <- mixed(log1p(Value) ~ Group + (1|Subject), dat))
##        Effect    stat ndf ddf F.scaling p.value
## 1 (Intercept) 458.650   1  15         1  0.0000
## 2       Group   2.721   2  15         1  0.0981

The effect is significant for the untransformed and sqrt dv. But are these model sensible? Let's plot the residuals.

png("qq.png", 800, 300, units = "px", pointsize = 12)
par(mfrow = c(1, 3))
par(cex = 1.1)
par(mar = c(2, 2, 2, 1)+0.1)
qqnorm(resid(model[[2]]), main = "original")
qqline(resid(model[[2]]))
qqnorm(resid(model.s[[2]]), main = "sqrt")
qqline(resid(model.s[[2]]))
qqnorm(resid(model.l[[2]]), main = "log")
qqline(resid(model.l[[2]]))
dev.off()

enter image description here

It seems that the model with sqrt trasnformation provides a reasonable fit (there still seems to be one outlier, but I will ignore it). So, let's further inspect this model using multcomp to get the comparisons among groups:

require(multcomp)

# using bonferroni-holm correction of multiple comparison
summary(glht(model.s[[2]], linfct = mcp(Group = "Tukey")), test = adjusted("holm"))
##          Simultaneous Tests for General Linear Hypotheses
## 
## Multiple Comparisons of Means: Tukey Contrasts
## 
## 
## Fit: lmer(formula = sqrt(Value) ~ Group + (1 | Subject), data = data)
## 
## Linear Hypotheses:
##            Estimate Std. Error z value Pr(>|z|)  
## 2 - 1 == 0  -0.0754     0.3314   -0.23    0.820  
## 3 - 1 == 0   1.1189     0.4419    2.53    0.023 *
## 3 - 2 == 0   1.1943     0.4335    2.75    0.018 *
## ---
## Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
## (Adjusted p values reported -- holm method)

# using default multiple comparison correction (which I don't understand)
summary(glht(model.s[[2]], linfct = mcp(Group = "Tukey")))
##          Simultaneous Tests for General Linear Hypotheses
## 
## Multiple Comparisons of Means: Tukey Contrasts
## 
## 
## Fit: lmer(formula = sqrt(Value) ~ Group + (1 | Subject), data = data)
## 
## Linear Hypotheses:
##            Estimate Std. Error z value Pr(>|z|)  
## 2 - 1 == 0  -0.0754     0.3314   -0.23    0.972  
## 3 - 1 == 0   1.1189     0.4419    2.53    0.030 *
## 3 - 2 == 0   1.1943     0.4335    2.75    0.016 *
## ---
## Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
## (Adjusted p values reported -- single-step method)

Punchline: group 3 differs from the other two groups which do not differ among each other.

Thank you for your response. I have run the code and duplicated your results. I will need to examine the code of these functions and run some simulations to understand what is occurring. I added some further questions in the original post. — Flask, Oct 10 '13 at 19:51
I'm wondering why `gls(f(Value) ~ Group, data=dat, na.action=na.omit, correlation=corSymm(form= ~ 1 | Subject))` provides quite different residuals. Isn't it the same model ? — Stéphane Laurent, Oct 10 '13 at 20:02
@StéphaneLaurent I think the same model can only be obtained with `lme` as you need to take the nested structure of the data into account. The following model gives the same F and p value for the effect of `Group` and the same residuals: `lme(Value ~ Group, random = ~ 1|Subject, dat)` — Henrik, Oct 10 '13 at 20:37
Sorry, the equivalent `gls` model is `gls(f(Value) ~ Group, data=dat, na.action=na.omit, correlation=corCompSymm(form= ~ 1 | Subject))` (exchangeable repeated measures). But it also provides different residuals. — Stéphane Laurent, Oct 10 '13 at 20:41
@StéphaneLaurent Nah, I don't think so. Just look at the dfs, the denominator dfs are 105. `gls` cannot handle the nested structure (i.e., replicates for participants). — Henrik, Oct 10 '13 at 20:46
Yes, degrees of freedom given in the output are particular, but estimates (fixed effects and variances) are the same. — Stéphane Laurent, Oct 10 '13 at 20:50
ok answer accepted as useful but with reservations. I am beginning to doubt the significance testing approach is suitable for my needs. It appears to convey a "confidence" in the accuracy of the results here that I cannot accept without further reading and understanding. In that case my question is misspecified. — Flask, Oct 24 '13 at 03:58

Stéphane Laurent · Answer 2 · 2013-10-10T22:40:30.343

For information, the random-effect model given by @Henrik:

> f <- function(x) sqrt(x)
> library(lme4)
> ( fit1 <- lmer(f(Value) ~ Group + (1|Subject), data=dat) )
Linear mixed model fit by REML ['lmerMod']
Formula: f(Value) ~ Group + (1 | Subject) 
   Data: dat 
REML criterion at convergence: 296.3579 
Random effects:
 Groups   Name        Std.Dev.
 Subject  (Intercept) 0.5336  
 Residual             0.8673  
Number of obs: 108, groups: Subject, 18
Fixed Effects:
(Intercept)       Group2       Group3  
    3.03718     -0.07541      1.11886

is equivalent to a generalized least-squares model with an exchangeable correlation structure for subjects:

> library(nlme)
> fit2 <-  gls(f(Value) ~ Group, data=dat, na.action=na.omit, correlation=corCompSymm(form= ~  1 | Subject))

The fitted variance matrix is then:

> getVarCov(fit2)
Marginal variance covariance matrix
        [,1]    [,2]    [,3]    [,4]    [,5]    [,6]
[1,] 1.03690 0.28471 0.28471 0.28471 0.28471 0.28471
[2,] 0.28471 1.03690 0.28471 0.28471 0.28471 0.28471
[3,] 0.28471 0.28471 1.03690 0.28471 0.28471 0.28471
[4,] 0.28471 0.28471 0.28471 1.03690 0.28471 0.28471
[5,] 0.28471 0.28471 0.28471 0.28471 1.03690 0.28471
[6,] 0.28471 0.28471 0.28471 0.28471 0.28471 1.03690
  Standard Deviations: 1.0183 1.0183 1.0183 1.0183 1.0183 1.0183

As you can see, the diagonal entry corresponds to the total variance in the first model:

> VarCorr(fit1)
 Groups   Name        Std.Dev.
 Subject  (Intercept) 0.53358 
 Residual             0.86731 
> 0.53358^2+0.86731^2
[1] 1.036934

and the covariance corresponds to the between-subject variance:

> 0.53358^2
[1] 0.2847076

Actually the gls model is more general because it allows a negative covariance. The advantage of nlme is that you can more generally use other repeated correlation structures and also you can specify different variances per group with the weights argument.

I think that residuals are different because they are constructed with the random-effects in the first model. In order to get multiple comparisons you can use the lsmeans and the multcomp packages, but the $p$-values of the hypotheses tests are anticonservative with defaults (too high) degrees of freedom. Unfortunately, the pbkrtest package does not apply to gls/lme models.

Do you know why this output is different in R 2.14.2 vs 3.0.1? It also does not say the "['lmerMod'] in line 4 of your first code panel. — Flask, Oct 10 '13 at 21:58

Stéphane Laurent · Answer 3 · 2014-04-12T16:17:16.170

1

Now, try to you write down the model: $y_{ijk} = ...$ where $y_{ijk}$ is the $k$-th value for individual $j$ of group $i$. Then look at what happens for the means $\bar y_{ij\bullet}$: you get a classical Gaussian linear model, with variance homogeneity because there are $6$ repeated measures for each subject:

> xtabs(~Group+Subject, data=dat)
     Subject
Group 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
    1 6 6 6 6 6 6 6 0 0  0  0  0  0  0  0  0  0  0
    2 0 0 0 0 0 0 0 6 6  6  6  6  6  6  6  0  0  0
    3 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  6  6  6

Thus, since you are interested in mean comparisons only, you don't need to resort to a random-effect or generalised least-squares model - just use a classical (fixed effects) model using the means $\bar y_{ij\bullet}$ as the observations:

tdat <- transform(dat, tvalue=f(Value))
dd <- aggregate(tvalue~Group+Subject, data=tdat, FUN=mean)
fit3 <- lm(tvalue~Group, data=dd)

I think this approach always correctly work when we average the data over the levels of a random effect (I show on my blog how this fails for an example with a fixed effect).

The ANOVA provides the same answer as @Henrik's approach (and that shows that Kenward-Rogers approximation is correct):

> anova(fit3)
Analysis of Variance Table

Response: tvalue
          Df Sum Sq Mean Sq F value  Pr(>F)  
Group      2 3.3799 1.68994   4.121 0.03747 *

Then you can use TukeyHSD() or the lsmeans package for multiple comparisons:

> TukeyHSD(aov(fit3), "Group")
  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = fit3)

$Group
           diff         lwr       upr     p adj
2-1 -0.07541248 -0.93627828 0.7854533 0.9719148
3-1  1.11885667 -0.02896441 2.2666777 0.0565628
3-2  1.19426915  0.06817536 2.3203629 0.0370434

> library(lsmeans)
> lsmeans(fit3, pairwise~Group)

$`Group pairwise differences`
         estimate        SE df  t.ratio p.value
1 - 2  0.07541248 0.3314247 15  0.22754 0.97191
1 - 3 -1.11885667 0.4418996 15 -2.53193 0.05656
2 - 3 -1.19426915 0.4335348 15 -2.75472 0.03704
    p values are adjusted using the tukey method for 3 means

edited Apr 12 '14 at 16:17

answered Oct 10 '13 at 21:54

Stéphane Laurent

17,425
5
59
101

So in this case, if I had **not transformed** using square root I get pvalues of .9124, .0099, and .0046 respectively. Also, I still do not understand how it is possible to ignore the within-subject variance when comparing group means. The confidence intervals for the means are the same as well? – Flask Oct 10 '13 at 22:14
Yes, as long as you are interested in means only, you don't loose information by only looking at the subjects means. You don't ignore within-variance, you only ignore the decomposition of variance. – Stéphane Laurent Oct 10 '13 at 22:23
@Flask A colleague of mine, which is not mathematician but which has a very strong intuition in statistics, would say that the subject is the "unit of observation", and then only his mean value plays a role. – Stéphane Laurent Oct 10 '13 at 22:25
mmm..This does not meet my intuition. If I am less sure about the individual means it should decrease my confidence in the estimate for group means. [see second to last post in this thread](http://www.physicsforums.com/showthread.php?t=608932&page=2) – Flask Oct 10 '13 at 22:35
It took me one year of practice to understand :) I'm going to bed now. – Stéphane Laurent Oct 10 '13 at 22:41
I am sure you are correct, but if that is true then this procedure must be answering the wrong question for my purposes. If I have no idea what is the cause of this large within-subject variance (subgroups amongst the measurements, etc) this must cause me to doubt the "external validity" of the conclusions due to the significance test more so than if the measurements were precise and consistent. – Flask Oct 10 '13 at 22:52
1

i don't understand what you say. If you want to compare group means, the procedure is correct. – Stéphane Laurent Oct 11 '13 at 06:21
Well is there a procedure I can use to do this that will give me a wider estimate for group mean if the measurements are less precise? Regardless of any math theory such a procedure is bound to be closer to what I have in mind. I messed around with rjags and the credible intervals for group level means seem to scale with the within-subject variance. Is this what I want to do? – Flask Oct 11 '13 at 14:08
@Flask, as I said, the within-variance is not ignored. Try some simulations, use "my" method, increase the within-variance and you will see that the intervals become larger (because by increasing the within-variance you increase the total variance). – Stéphane Laurent Oct 11 '13 at 14:12
This function does not include any information regarding within-subject variance: `dd – Flask Oct 11 '13 at 15:37
I have added the code and result to the question – Flask Oct 11 '13 at 15:49
@Flask This discussion is too long. Maybe you should isolate this new specific question and open a new thread. – Stéphane Laurent Oct 11 '13 at 17:17
Hopefully the answers to this question will help me: http://stats.stackexchange.com/questions/72573/when-making-inferences-about-group-means-are-credible-intervals-sensitive-to-wi – Flask Oct 11 '13 at 18:06
What would be the correct procedure to compare the distributions (i.e. the variance) between *groups* if using similar data to that in this example? (That is, multiple replicates for several individuals per group, with two or more groups). – JSnf2012 Mar 31 '14 at 09:42

How to compare two groups with multiple measurements for each individual with R?

3 Answers3

Linked