6

I have recently learned about LS means (estimated marginal means, predicted marginal means) and I am trying to understand what they could be used for and under what circumstances.

For concreteness, consider a dependent variable $y$ and two categorical independent variables, $x_1$ with two categories and $x_2$ with three categories. One could create dummy variables corresponding to these categories and call them $d_{1,1}, d_{1,2}$ and $d_{2,1}, d_{2,2}, d_{2,3}$. One could then have a linear model (without interaction terms) $$ y = \beta_0 + \beta_{1,2} d_{1,2} + \beta_{2,2} d_{2,2} + \beta_{2,3} d_{2,3} + \varepsilon $$ where $d_{1,1}$ and $d_{2,1}$ are the reference categories. LS means for $x_1$ would be \begin{align} \bar y_{1,1} &= \beta_0 &+ \frac{1}{3}(\beta_{2,2} + \beta_{2,3}), \\ \bar y_{1,2} &= \beta_0 + \beta_{1,2} &+ \frac{1}{3}(\beta_{2,2} + \beta_{2,3}). \\ \end{align}

Uses I can think of
Given $x_1$ and $x_2$, the best (in MSE sense) prediction of $y$ is $\beta_0 + \beta_{1,2} d_{1,2} + \beta_{2,2} d_{2,2} + \beta_{2,3} d_{2,3}$. This is also the expected result after treatment if $x_1$ and/or $x_2$ are interpreted as levels of treatment.
Given $x_1$ alone, the best prediction of $y$ is $\frac{1}{n}\sum_{i=1}^n y_i \mathbb{1}_{d_j=1}$ for $x_1$ being in the category $j$. This is also the expected result after treatment if $x_1$ are interpreted as levels of treatment.
None of these two coincides with $\bar y_{1,1}$ or $\bar y_{1,2}$.
I get that

Least-squares means [are] predictions from a model over a regular grid, averaged over zero or more dimensions

(which is the Wiki excerpt for the tag), but is what is the practical use of that?
So far I can see only one situation in which this could be useful; this is if we know that in population the proportion of observations that have $d_{i,j}=1$ and $d_{k,l}=1$ is the same for all combinations of $i,j,k,l$. Is that the intended use of LS means? Or can it be useful for description or hypothesis testing?

Richard Hardy
  • 54,375
  • 10
  • 95
  • 219
  • I guess that this answer, https://stats.stackexchange.com/a/162093/164061 , explaining lsmeans in a very concise way, also provides a very good reason to use these means. You make use of lsmeans when you wish to control for covariates. – Sextus Empiricus Mar 10 '18 at 12:59
  • @MartijnWeterings, so far I do not see the added value of LS means over the regression coefficients, but I see added confusion. I know how LS means are calculated, this is not the problem. The problem for me is in interpretation and use. – Richard Hardy Mar 10 '18 at 13:20
  • Brian Ripley wrote: "Some of us feel that type III sum of squares and so-called ls-means are statistical nonsense which should have been left in SAS." https://biostat-lists.wustl.edu/sympa/arc/s-news/1999-05/msg00320.html (But he doesn't say why) – kjetil b halvorsen Mar 10 '18 at 13:51
  • @RichardHardy It was unclear to me that this was your angle. So if I get it correctly now, you do not wonder so much about the value of LS means, but more specifically about the added value in comparison to regression coefficients. I guess that they are just a different way to express the results of the regression and a different way to present the coefficients. It is a different representation of the model. – Sextus Empiricus Mar 10 '18 at 14:02
  • @MartijnWeterings, you got me right this time. The biggest issue so far is the population parameters LS means correspond to, and I do not find such ones. Thus LS means seem kind of artificial to me... I guess this is also what Brian Ripley meant?.. – Richard Hardy Mar 10 '18 at 14:12
  • You could see it as a re-parameterization of the model into: $$y = \bar{y}_{2,1} d_{2,1} + \bar{y}_{2,2} d_{2,2} + \bar{y}_{2,3} d_{2,3} + 0.5*(\bar{y}_{1,2}-\bar{y}_{1,1}) (d_{1,2}-d_{1,1})+ \epsilon$$ or $$y = \bar{y}_{2,1} d_{2,1} + \bar{y}_{2,2} d_{2,2} + \bar{y}_{2,3} d_{2,3} + \left(\bar{y}_{1,1}-\frac{\bar{y}_{2,1}+\bar{y}_{2,2}+\bar{y}_{2,3}}{3}\right) d_{1,1}- \left(\bar{y}_{1,1}-\frac{\bar{y}_{2,1}+\bar{y}_{2,2}+\bar{y}_{2,3}}{3}\right) d_{1,2}+ \epsilon$$ – Sextus Empiricus Mar 10 '18 at 14:18
  • @MartijnWeterings, Or maybe (given these parameterizations) I should have said that I do not see any interesting question that the LS means would give an answer to. – Richard Hardy Mar 10 '18 at 14:23
  • 1
    In this question ( https://stats.stackexchange.com/questions/308556/both-variables-of-my-glmm-output-are-significant-dont-know-how-to-interpret-it ) a person desires some interpretation for the 4 coefficients in a 2x2 model (with cross terms). The summation of the coefficients provides an intuitive interpretation of the outcome of the model. I guess that the LS means do something similar for the case without the cross-term and only main-effects. The LS means solve the problem/question of presenting the model values in a way that is more easy to interpret (the scale is more intuitive). – Sextus Empiricus Mar 10 '18 at 14:34
  • 1
    I often model `y ~ 0 + x` instead of `y ~ 1 + x` because I find this intercept term in place of a variable term annoying. – Sextus Empiricus Mar 10 '18 at 14:35
  • 3
    @MartijnWeterings, but the only valid interpretation (AFAIK) is under the assumption that the each category is equally likely in population. Otherwise it is misleading rather than easy to interpret, IMHO. You want effect size? Go for the regression coefficients. You want expected values given just one predictor? Go for conditional means. In this perspective, what question would the LS means be an answer to? – Richard Hardy Mar 10 '18 at 14:35
  • 3
    I agree it is misleading. One could still correct for the unequal distribution, but indeed it remains misleading. It is after all a fictitious value, some artificial construction of combining different groups. Still going back to my initial comment: I don't think that the LS means are so much in use as an alternative expression of the regression coefficients, but more as an alternative to group means (correcting for correlating covariates). – Sextus Empiricus Mar 10 '18 at 14:40

1 Answers1

5

I disagree strongly with the "only situation" in the OP. EMMs (estimated marginal means, more restrictively known as least-squares means) are very useful for heading off a Simpson's paradox situation in evaluating the effects of a factor. In your example, consider a scenario where these three things are true:

  • When $x_2$ is held at any fixed level, the lowest mean response occurs at $x_1=1$.
  • For $x_1$ held fixed at either level, the highest mean response occurs when $x_2=3$.
  • The combination $(x_1=1, x_2=3)$ has a disproportionately large sample size, while $(x_1=1,x_2=1)$ and $(x_1=1,x_2=2)$ have small sample sizes.

Then it is possible that the marginal mean of $x_1$ is higher than that for $x_2$, even though the mean for $x_1=1$ is less than that for $x_1=2$ for each $x_2$.

If one instead computes EMMs, the observed means at $x_1=1$ and $x_2=1,2,3$ receive equal weight, so that the EMM for $x_1=1$ is less than that for $x_1=2$.

EMMs are comparable to what is termed "unweighted means analysis" in old experimental design texts. The idea was useful many decades ago, and it still is.

The "basics" vignette for the R package emmeans has a concrete illustration and some discussion of such issues.

Disclaimer

I have spent the last 5 years or so developing/refining R packages for such purposes, so I'm not exactly an objective observer. I hope to hear other perspectives.

Russ Lenth
  • 15,161
  • 20
  • 53
  • 1
    Thank you for your answer. I am genuinely curious and I do not wish to imply that there is only one situation where LS means are useful. To the contrary, it is the only situation I could come up with independently, and that is it. That is also why I am asking the question. I am now trying to understand your main argument, and the source you refer to is very helpful. Therefore, +1. *EMMs <...> are very useful for heading off a Simpson's paradox situation in evaluating the effects of a factor*. For one, the *regression coefficients* are a great way for evaluating the effects of a factor. ... – Richard Hardy Mar 10 '18 at 12:09
  • 1
    ...Meanwhile, EMMs seem to take this one step forward into a fantasy land (sorry about strong the wording) where the proportion of observations in each category is equal in population. The regression coefficients are straightforward to interpret while the EMMs reflect conditional means of some fictitious population which is different from the one we are dealing with (except when it is not). I could understand the use of EMMs minus the intercept, but it is a bit harder to justify EMMs when the intercept is included. I guess I should not be ranting like that, though :) Once again, thanks! – Richard Hardy Mar 10 '18 at 12:10
  • 1
    I invite you to try an example fitting a model with the default parameterization `contr.treatment`, and then using EMMs and pairwise comparisons. The regression coefficients then estimate comparisons between certain cases. Compare those to the corresponding pairwise comparisons of EMMs. Then notice that you can get all the comparisons easily from the EMMs, whereas you get only some of them from the regression coefficients. The more factors are involved, the less useful the regression coefficients become. – Russ Lenth Mar 10 '18 at 12:59
  • BTW, that same vignette has a discussion of analysis objectives where EMMs are not useful. – Russ Lenth Mar 10 '18 at 13:02
  • I’ve thought about this a bit, and think the fantasy land characterization is probably pretty appropriate. But that seems ok to me. In a lot of ways, any kind of a mean is an abstraction. A community may average 3.79 fire calls per day, for example, even though we can never have that many on a given day. We explain that this is what happens on average, just as we can explain that a certain EMM is the average of three predictions. It’s just an average, and it is useful in presenting model results without introducing bias in comparisons thereof. – Russ Lenth Mar 10 '18 at 23:37
  • Thanks. Still, 3.79 is an estimate of a population parameter for a real population, while LS means is an estimate of population parameters for a fictitious population. This is disturbing, given that we actually have the real population and could easily get an estimate for it, but we choose a fictitious population instead. – Richard Hardy Mar 11 '18 at 07:49
  • 1
    (mu11 + mu21 + mu31)/3 is a parameter too. Read the paper by Searle, Speed, Milliken. They define EMMs as estimates of population marginal means. – Russ Lenth Mar 11 '18 at 13:50
  • https://www.jstor.org/stable/2684063?seq=1#page_scan_tab_contents – Russ Lenth Mar 11 '18 at 13:54
  • 3
    My basic problem remains that this object does not seem to be an answer to any interesting question. Useless objects are typically neither discussed nor even reported, but this one is, suggesting it is not useless. What I am trying to find out is, what use it could have. The uses I have seen so far do not seem to be of any practical interest. Just to note, my intention is not to bash LS means but to understand why they were created. So far I am failing at this. Perhaps Brian Ripley and I just think alike :) – Richard Hardy Mar 11 '18 at 14:20
  • OK, but for one, I was curious to see why anyone would be interested in this. Moreover, if I were to teach this to my students, I would have trouble finding any logical examples to illustrate this with. This year I skipped LS means because I could not find one. I was hoping for better luck for the next year. I am sorry, perhaps this is getting unproductive, but I appreciate your input. – Richard Hardy Mar 11 '18 at 14:26
  • 2
    Suppose you do a controlled experiment and collect 5 observations at each combination of two factors. Then you compute the marginal means. Are you comfortable with that? And what population are you making an inference for? Didn’t you define this population by your choice of factor levels and by your choice to run a balanced experiment? – Russ Lenth Mar 11 '18 at 14:40
  • So far I am not comfortable with LS means unless they coincide with something else that I am comfortable with. In a balanced case, they coincide with conditional means given just one of the factors, which answers the question "What is the expected result after treatment in a balanced population if I do not know anything about the other factor?". Why would one do a balanced experiment? Either the population is balanced or one wants to estimate the effects on different categories with equal precision. But if the population is not balanced, LS means does not seem to answer any relevant question. – Richard Hardy Mar 11 '18 at 15:24
  • The equal weights that LS means use are as insensible for an unbalanced population as would unequal weights be for a balanced population. Why would one want to distort the category proportions that are found in the actual population? Think of it this way: why would I use LS means if my categories were severy unbalanced in population? Why should I give such a disproportionately high weight for the smallest categories? LS means neither helps predict the result in absence of the other factor nor measures the "average result" since simple average disregards the imbalance (the two are the same). – Richard Hardy Mar 11 '18 at 15:24
  • @Richard Hardy: Designed experiments are often used in settings where the population concept itself is artificial. One is studying a process, not a population. – kjetil b halvorsen Feb 24 '21 at 14:15