Mediation/moderation: when interested in the effects of X on Y, should I leave my mediator/moderator(+interaction) in the analysis?

Question

If I am running a mediation analysis (X: independent; Y: dependent; M: mediator; also applies to moderation), but I am also interested in the simple correlation between X and Y (i.e., for my first hypothesis), should I interpret the full model including the mediator, or interpret a separate model that only includes X?

In other words, when interested in the correlation between X and Y, should I interpret a model that only includes X, to 'isolate' the association with Y, or should I interpret the model with X and M?

Often, in my experience, in line with the old Baron and Kenny method, several hypotheses are drafted prior to estimating mediation:

H1: "X has a significantly positive association with Y"; X -> Y

H2: X -> M

H3: M -> Y

H4: Mediation hypothesis

I keep hearing different opinions about this. I've heard it is a good idea to isolate the effect, as you would not have controlled for M if you would not have had the mediator idea or dataset. I've also heard it is a good idea to include M specifically because it acts as a control variable.

Does this answer your question? [Two variables measure the same thing, are they confounding?](https://stats.stackexchange.com/questions/464502/two-variables-measure-the-same-thing-are-they-confounding) — Adrian Keister, Jan 27 '22 at 21:50
It's bordering on it. Say I'm testing a moderated mediation model (the model itself actually doesn't really matter). If I am now also interested in the effect of X on Y (/ the association between X and Y), or even of X on M, would you set up separate models (simple univariate regression models with just X as a predictor and either Y or M as a dependent variable) or would you just use the full moderated mediation model (with all variables) and interpret the coefficients of X -> Y from that model? In the latter case, those other variables would act as control variables of the effect of X -> Y. — user347909, Jan 27 '22 at 21:54
If you're interested in the effect of $X$ on $Y,$ you do NOT put $M$ in your model. Doing so conditions on $M,$ which stops causal information from flowing on the path $X\to M\to Y.$ If you're interested in the causal effect of $X$ on $M,$ then you can just model $M$ on $X.$ In that case, $Y$ is not a confounding variable, because it does not set up a backdoor path: the arrow is FROM $X$ TO $Y.$ — Adrian Keister, Jan 27 '22 at 22:03
Got it. I guess the second part of my question is not really statistics-heavy. If I am estimating X -> Y as part of a moderation analysis (that I am conducting afterwards), would I interpret a model with the moderator in it, or a model with just X? If I include the moderator variable, I am effectively controlling for the influence of the moderator. Would this be sensible to do? What about including the interaction term? — user347909, Jan 27 '22 at 22:22
I'm not sure I understand exactly what you're asking with the moderation analysis question. And I'm not sufficiently well-versed to answer the interaction question. — Adrian Keister, Jan 27 '22 at 22:24
Say I know I want to test a simple moderation model. I draft two hypotheses: X is associated with Y (1). W moderates the association between X and Y (2). For my first hypothesis (1), would I do well to include W in the model I am using? Technically, I am only wanting to make claims about the association between X and Y. However, by including W, I am controlling for the influence of W (i.e. the covariance between X and W). Would this be preferred? Apologies if I am not clear. — user347909, Jan 27 '22 at 22:33
Well, the $v$-structures (collider placement and existence) of those two hypotheses are different, so by Theorem 1.2.8 in Pearl's *Causality* book, 2nd Ed., your data should theoretically be able to distinguish between the two models. Unfortunately, I am not well-versed enough in causality with regression to be able to help you out. Chapter 5 of the same book would seem to be relevant, as you are essentially testing two different models. That's a very difficult book, though: I have not been able to make serious headway in it, as it assumes a lot of the reader. — Adrian Keister, Jan 27 '22 at 22:45

score 1 · Answer 1 · answered Jan 28 '22 at 23:10

The What If? book by Hernán and Robins addresses your situation in Section 18.2, most explicitly in the legend to Figure 18.4. With your "X" and "M" replaced respectively by "A" and "L" and an additional explicit path shown from A→Y that bypasses L:

adjusting for L blocks the path A→L→Y but not the [bypass] path A→Y. Thus the A-Y association adjusted for L is a biased estimator of the total effect of A on Y but an unbiased estimator of the direct effect of A on Y that is not mediated through L.

In the associated text:

Sometimes this problem is referred to as overadjustment for mediators when the average causal effect of A on Y is the contrast of interest.

One might argue that you should do both analyses. In your case and under your causal hypothesis, omitting M shows the overall effect of A on Y. Including M indicates the portion of that effect that is not mediated by M.

Perfect - many thanks. This is in line with suggestions by Adrian Keister and also by several papers I've been reading (e.g., Andrew F. Hayes (2009) Beyond Baron and Kenny: Statistical Mediation Analysis in the New Millennium, Communication Monographs, 76:4, 408-420, DOI: 10.1080/03637750903310360). However, how would this work for a moderation analysis? When estimating X-Y (A-Y), would it be sensible to include the moderator in the model, given that it acts as a control variable? — user347909, Jan 30 '22 at 00:07
A moderation is essentially an interaction. See [this page](https://stats.stackexchange.com/q/18848/28500). It’s best to include both the “main effects” and the interaction term in the model. That doesn’t mean that the moderation/interaction necessarily is represented as a simple product term, however. — EdM, Jan 30 '22 at 03:24

Mediation/moderation: when interested in the effects of X on Y, should I leave my mediator/moderator(+interaction) in the analysis?

1 Answers1