How does the distinction between association and causation affect the interpretation of linear models?

Question

Lurking variables probably have something to do with this. I'm just trying to figure out how their difference can affect a linear model.

score 2 · Answer 1 · edited Apr 13 '17 at 12:44

The (potentially causal) interpretation of your model doesn't come from the model itself. It comes from the design / setup of your study. Causality can primarily be inferred from your model when you ran a true experiment. There are various methods for attempting to infer causality with observational data (e.g., instrumental variables, difference-in-differences, propensity scores, etc.), but they all require additional assumptions and are not as strong as experiments in general. If you don't have a true experiment, it is safest to always assume your model estimates a marginal association only.

Omitted / lurking variables affect this when they are correlated with $X$ variables in your model and with your response ($Y$) variable. In that case, they bias your estimates such that a variable could appear causal when it actually isn't. To understand this better, it may help to read my answer here: Estimating $b_1x_1+b_2x_2$ instead of $b_1x_1+b_2x_2+b_3x_3$.

I'm a little confused by the link you posted. Are you saying that correlation affects the error? So if I have a variable that is correlated then I have a bias in the model? Sorry, I'm a bit confused... — LSerrano113, Dec 09 '14 at 21:39
@Carla, yes that's the idea. If there is a variable x3 that is correlated w/ both another x variable (x2) & is correlated w/ y, then there can be problems. Eg, x2 can be causally unrelated to Y, but will look like it causes Y. — gung - Reinstate Monica, Dec 09 '14 at 21:44

How does the distinction between association and causation affect the interpretation of linear models?

1 Answers1