Which multiple comparison method to use for a lmer model: lsmeans or glht?

Question

I'm analyzing a data set using a mixed effects model with one fixed effect (condition) and two random effects (participant due to the within subject design and pair). The model was generated with the lme4 package: exp.model<-lmer(outcome~condition+(1|participant)+(1|pair),data=exp).

Next, I performed a likelihood ratio test of this model against the model without the fixed effect (condition) and have a significant difference. There are 3 conditions in my data set so I want to do a multiple comparison but I am not sure which method to use. I found a number of similar questions on CrossValidated and other forums but I am still quite confused.

From what I've seen, people have suggested using

1. The lsmeans package - lsmeans(exp.model,pairwise~condition) which gives me the following output:

condition     lsmean         SE    df  lower.CL  upper.CL
 Condition1 0.6538060 0.03272705 47.98 0.5880030 0.7196089
 Condition2 0.7027413 0.03272705 47.98 0.6369384 0.7685443
 Condition3 0.7580522 0.03272705 47.98 0.6922493 0.8238552

Confidence level used: 0.95 

$contrasts
 contrast                   estimate         SE    df t.ratio p.value
 Condition1 - Condition2 -0.04893538 0.03813262 62.07  -1.283  0.4099
 Condition1 - Condition3 -0.10424628 0.03813262 62.07  -2.734  0.0219
 Condition2 - Condition3 -0.05531090 0.03813262 62.07  -1.450  0.3217

P value adjustment: tukey method for comparing a family of 3 estimates

2. The multcomp package in two different ways - using mcp glht(exp.model,mcp(condition="Tukey")) resulting in

     Simultaneous Tests for General Linear Hypotheses

Multiple Comparisons of Means: Tukey Contrasts


Fit: lmer(formula = outcome ~ condition + (1 | participant) + (1 | pair), 
    data = exp, REML = FALSE)

Linear Hypotheses:
                             Estimate Std. Error z value Pr(>|z|)  
Condition2 - Condition1 == 0  0.04894    0.03749   1.305    0.392  
Condition3 - Condition1 == 0  0.10425    0.03749   2.781    0.015 *
Condition3 - Condition2 == 0  0.05531    0.03749   1.475    0.303  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Adjusted p values reported -- single-step method)

and using lsm glht(exp.model,lsm(pairwise~condition)) resulting in

Note: df set to 62

     Simultaneous Tests for General Linear Hypotheses

Fit: lmer(formula = outcome ~ condition + (1 | participant) + (1 | pair), 
    data = exp, REML = FALSE)

Linear Hypotheses:
                             Estimate Std. Error t value Pr(>|t|)  
Condition1 - Condition2 == 0 -0.04894    0.03749  -1.305   0.3977  
Condition1 - Condition3 == 0 -0.10425    0.03749  -2.781   0.0195 *
Condition2 - Condition3 == 0 -0.05531    0.03749  -1.475   0.3098  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Adjusted p values reported -- single-step method)

As you can see, the methods give different results. This is my first time working with R and stats so something might be going wrong but I wouldn't know where. My questions are:

What are the differences between the presented methods? I read in an answer to a related questions that it's about the degrees of freedom (lsmeans vs. glht). Are there some rules or recommendations when to use which one, i.e., method 1 is good for this type of data set/model etc.? Which result should I report? Without knowing better I'd probably just go and report the highest p-value I got to play it safe but it would be nice to have a better reason. Thanks

score 21 · Accepted Answer · answered Mar 31 '16 at 15:34

Not a complete answer...

The difference between glht(myfit, mcp(myfactor="Tukey")) and the two other methods is that this way uses a "z" statistic (normal distribution), whereas the other ones use a "t" statistic (Student distribution). The "z" statistic it the same as a "t" statistic with an infinite degree of freedom. This method is an asymptotic one and it provides smaller p-values and shorter confidence intervals than the other ones. The p-values can be too small and the confidence intervals can be too short if the dataset is small.

When I run lsmeans(myfit, pairwise~myfactor) the following message appears:

Loading required namespace: pbkrtest

That means that lsmeans (for a lmer model) uses the pbkrtest package which implements the Kenward & Rogers method for the degrees of freedom of the "t" statistic. This method intends to provide better p-values and confidence intervals than the asymptotic one (there's no difference when the degree of freedom is large).

Now, about the difference between lsmeans(myfit, pairwise~myfactor)$contrasts and glht(myfit, lsm(pairwise~factor), I have just done some tests and my observations are the following ones:

lsm is an interface between the lsmeans package and the multcomp package (see ?lsm)
for a balanced design there's no difference between the results
for an unbalanced design, I observed small differences between the results (the standard errors and the t ratio)

Unfortunately I do not know what is the cause of these differences. It looks like lsm calls lsmeans only to get the linear hypotheses matrix and the degrees of freedom, but lsmeans uses a different way to calculate the standard errors.

Thanks for the detailed response! I missed the difference in test statistic completely... You mention that the values can be too small and CIs too narrow for the asymptotic method. My data set consists of ~30 participants so I guess I'll stick to the t-statistic. When you say that the Kenward & Rogers method leads to better p-values, do you mean more accurate or smaller? So the differences are due to differences in df and SE calculation methods and no due to the incorrect use of one of them with my model, if I understood you correctly. Is there a way to pick the "best" method here? — schvaba986, Mar 31 '16 at 16:09
(I am the **lsmeans** package developer) `lsmeans` uses the pbkrtest package, which provides for (1) Kenward-Rogers d.f. calculations and (2) an adjusted covariance matrix with reduced bias in the estimates. If you first set `lsm.options(disable.pbkrtest=TRUE)`, then the `lsmeans` call with `adjust="mvt"` will yield the same results as `glht`, except for slight differences due to the randomized algorithm used by both packages for the multivariate t distribution. — Russ Lenth, Apr 01 '16 at 00:55
However, I suggest the "mvt" adjustment without disabling pbkrtest, because of the bias adjustment and the fact that with no d.f., asymptotic (z) values essentially assume infinite d.f., thereby yielding unrealistically low P values. — Russ Lenth, Apr 01 '16 at 00:56
By the way, the `summary` method for `glht` allows for various step-down testing methods besides the default one-step (simultaneous CIs) multiplicity adjustment. On a completely different point, if you have more than one factor, `lsm` can create the usual types of comparisons quite easily, while `mcp` can't do it at all. — Russ Lenth, Apr 01 '16 at 01:03

score 0 · Answer 2 · answered Jul 08 '20 at 14:48

(This answer highlights some information already present in one comment by @RussLenth above)

In situations where the difference due to "z" vs. "t" statistic is negligible (i.e. in the limit of an infinite number of observations), calling lsmeans and multcomp with the default treatment of multiple comparisons would also give different results, because by default they use different methods (tukey vs. single-step). To remove this difference lsmeans and multcomp need to be called with the same multiple comparison method (e.g. none): lsmeans(exp.model,pairwise~condition, adjust='none') and summary(glht(exp.model,mcp(condition="Tukey")),test=adjusted(type="none"))

Which multiple comparison method to use for a lmer model: lsmeans or glht?

2 Answers2

Linked