7

I have a lsmeans problem in R. I want to do a post-hoc analysis of an interaction, similar to examples provided in the lsmeans documentation.

I am puzzled by the fact that the p-values are the same whether I use

warp.lm <- lm(breaks ~ wool * tension, data = warpbreaks)

(A):

lsmeans(warp.lm, list(pairwise ~ wool|tension, pairwise ~ tension|wool))

or (B):

lsmeans(warp.lm, list(pairwise ~ wool|tension))

Should the p-values not be higher in the case of (A) than (B)? (A) makes 9 post-hoc comparisons, (B) only 3.

Can I be sure that (A) is adjusted for all 9 comparisons?

Thank you!

Russ Lenth
  • 15,161
  • 20
  • 53
Sam
  • 73
  • 1
  • 1
  • 5
  • What do you mean by properly? I suggest using the lsmeans function first without the left-hand side , then use `pairs` to get the contrasts you want. Read the vignette "using-lsmeans" for a lt of examples and discussion. – Russ Lenth Aug 07 '15 at 13:31
  • @rvl Sorry this wasn't clear. For both commands you get the same p-values. tension = L: contrast estimate SE df t.ratio p.value A - B 16.333333 5.157299 48 3.167 0.0027 tension = M: A - B -4.777778 5.157299 48 -0.926 0.3589 tension = H: A - B 5.777778 5.157299 48 1.120 0.2682 Should the p-values not be higher after the first command than the second? There are more comparisons to be made after the first command. I tried the "pairs" route but the results are the same as "pairwise". – Sam Aug 07 '15 at 14:54
  • The p-value adjustments are applied to each grouping separately. If you want one adjustment for the whole set of comparisons, you'll need to do custom contrasts. – Russ Lenth Aug 07 '15 at 15:11
  • I have an answer suggestion to post, but I can't post it because the question was put on hold. If it can be taken off hold, I'll post it. – Russ Lenth Aug 07 '15 at 21:20
  • @Scortchi Could you please take the question off hold so that rvl could answer it? I have edited it so hopefully you will find it clearer. Thank you. – Sam Aug 10 '15 at 08:01
  • @Sam: I have done - thanks for editing it. (Note that once you edit a question it'll automatically join a review queue for re-opening.) But it's still as clear as mud to anyone unfamiliar with R's `lsmeans` package & I'd be grateful if you'd edit it further to explain what statistical analysis you're trying to carry out, & why the results are unexpected. – Scortchi - Reinstate Monica Aug 10 '15 at 08:56
  • @rvl: I've re-opened this question so you can answer - do please explain the underlying statistical issues as well as illustrating the analysis with `lsmeans`. – Scortchi - Reinstate Monica Aug 10 '15 at 09:00

2 Answers2

8

Provision in upcoming version of lsmeans

The next update of lsmeans (2.20 or later) will include an rbind method for ref.grid and lsmobj objects. It makes it easy to combine two or ore objects into one family, and defaults to the "mvt" adjustment method. Here is the present example:

> w.t <- pairs(lsmeans(warp.lm, ~ wool | tension))
> t.w <- pairs(lsmeans(warp.lm, ~ tension | wool))

> rbind(w.t, t.w)
 tension wool contrast   estimate       SE df t.ratio p.value
 L       .    A - B    16.3333333 5.157299 48   3.167  0.0203
 M       .    A - B    -4.7777778 5.157299 48  -0.926  0.9119
 H       .    A - B     5.7777778 5.157299 48   1.120  0.8258
 .       A    L - M    20.5555556 5.157299 48   3.986  0.0018
 .       A    L - H    20.0000000 5.157299 48   3.878  0.0027
 .       A    M - H    -0.5555556 5.157299 48  -0.108  1.0000
 .       B    L - M    -0.5555556 5.157299 48  -0.108  1.0000
 .       B    L - H     9.4444444 5.157299 48   1.831  0.3793
 .       B    M - H    10.0000000 5.157299 48   1.939  0.3191

P value adjustment: mvt method for 9 tests 

If you do specify Tukey instead of mvt, it comes fairly close to the same result:

> test(rbind(w.t, t.w), adjust = "tukey")
 tension wool contrast   estimate       SE df t.ratio p.value
 L       .    A - B    16.3333333 5.157299 48   3.167  0.0196
 M       .    A - B    -4.7777778 5.157299 48  -0.926  0.8683
 H       .    A - B     5.7777778 5.157299 48   1.120  0.7727
 .       A    L - M    20.5555556 5.157299 48   3.986  0.0018
 .       A    L - H    20.0000000 5.157299 48   3.878  0.0026
 .       A    M - H    -0.5555556 5.157299 48  -0.108  0.9999
 .       B    L - M    -0.5555556 5.157299 48  -0.108  0.9999
 .       B    L - H     9.4444444 5.157299 48   1.831  0.3467
 .       B    M - H    10.0000000 5.157299 48   1.939  0.2922

P value adjustment: tukey method for comparing a family of 4.772 estimates

The curiously fractional family size results from ${4.772 \choose 2} = 9$.

Russ Lenth
  • 15,161
  • 20
  • 53
7

Both of the lsmeans statements you show generate lists of lsmobjs, and each element of those lists is handled separately. If you want to incorporate an overall adjustment for two or more lists combined, it is technical and it takes a bit of work.

First, save the list:

lsmlist = lsmeans(warp.lm, list(pairwise ~ wool|tension, 
                  pairwise ~ tension|wool))

This creates a list of 4 lsmobjs (originally two lists of two)

> names(lsmlist)
[1] "lsmeans of wool | tension"                          
[2] "pairwise differences of contrast, tension | tension"
[3] "lsmeans of tension | wool"                          
[4] "pairwise differences of contrast, wool | wool"

The user wants to combine the three comparisons in lsmlist[[2]] with the six in lsmlist[[4]] and have an overall multiplicity adjustment for those 9 comparisons.

To start, create a new lsmobj from one of the results, and fix it up.

mydiffs = lsmlist[[4]]

First, bind together the linear functions for the two sets of comparisons:

mydiffs@linfct = rbind(lsmlist[[2]]@linfct, lsmlist[[4]]@linfct)

We also need to define the grid slot, which defines the factors associated with each linear function. To make it simple, I just define a factor named contrast with 9 levels for the 9 contrasts (something fancier could be done here).

mydiffs@grid = data.frame(contrast = 1:9)

Finally, fix up the auxiliary info that does the bookkeeping. We now use our new contrast factor as the only variable, with no "by" variables. For the combined family of 9 contrasts, the Tukey adjustment makes no sense so we use the multivariate $t$ ("mvt") method:

mydiffs = update(mydiffs, pri.vars = "contrast", by.vars = NULL,
                 adjust="mvt")

Now, we can look at the resulting summary:

> mydiffs

 contrast   estimate       SE df t.ratio p.value
        1 16.3333333 5.157299 48   3.167  0.0205
        2 -4.7777778 5.157299 48  -0.926  0.9119
        3  5.7777778 5.157299 48   1.120  0.8258
        4 20.5555556 5.157299 48   3.986  0.0019
        5 20.0000000 5.157299 48   3.878  0.0026
        6 -0.5555556 5.157299 48  -0.108  1.0000
        7 -0.5555556 5.157299 48  -0.108  1.0000
        8  9.4444444 5.157299 48   1.831  0.3796
        9 10.0000000 5.157299 48   1.939  0.3190

P value adjustment: mvt method for 9 tests

I could consider adding a feature for combining pieces of lsm.list objects, but it is not straightforward -- primarily because of complications in obtaining meaningful labels for the results (the grid part). It is also a problem that different users expect different defaults.

Russ Lenth
  • 15,161
  • 20
  • 53
  • fantastic answers as always. – Adam Robinsson Aug 11 '15 at 18:31
  • Thank you so much for your answer, this is brilliant! If there could be a warning message incorporated that would be useful, I think. This default does produce a lot of false positives. – Sam Aug 13 '15 at 13:21
  • Please note -- I posted a question related to this -- http://stats.stackexchange.com/questions/167097/its-all-in-the-family-but-do-we-include-the-in-laws-too – Russ Lenth Aug 14 '15 at 00:05