0

I'm currently using lsmeans to compare the means of various groups using the contrast argument. I'm using data that follows a Gaussian distribution. Would an unbalanced data structure affect the rate of type I errors or inflate them?

ziab_m
  • 133
  • 2
  • 8

1 Answers1

5

It uses $t$ tests, the observed contrast divided by the estimated standard error. It gets this information from the fitted model. Thus, the validity of the result depends on the validity of the model.

If, for example, you fitted a model using lm(), and that the errors are actually normally distributed with common variance (as assumed in the model), and the model structure itself is correct (no missing predictors), then the $t$ statistics from lsmeans() are correct, even with unbalanced data, and there is no biasing of type I errors when unadjusted tests are used, on a per-test basis.

However, most multiplicity adjustments, e.g. Tukey, are approximate when there is imbalance. The ‘“mvt”` adjustment is exactct in principle, but has slight anomalies due to the fact that the P values are computed using a simulation method.

Russ Lenth
  • 15,161
  • 20
  • 53
  • So for unbalanced datasets, you suggest using the "mvt" method? Do you know if lsmeans supports linear mixed effects models generated by lme4? – ziab_m Jun 19 '18 at 01:25
  • Yes, but I suggest switching to the **emmeans** package (successor to **lsmeans**) where all new development is taking place. – Russ Lenth Jun 19 '18 at 01:28
  • PS look at `vignette(“models”, “emmeans”)` for info and details on what models are supported. – Russ Lenth Jun 19 '18 at 01:30