Why arrange variables by causality in bivariate regression?

Question

Suppose we have variables $(X,Y)$ and we have theory tell us that $X$ $\overset{\text{cause}}{\implies} Y$. Perhaps they're time-series variables and it would be common to see something like this:

$$\boxed{Y_{t+1} = a + b X_{t} + e_{t+1}.\,\,\,\,\,(1)}$$

My question is why it's not just as legitimate to specify:

$$\boxed{X_t = \alpha + \beta Y_{t+1}+\varepsilon_{t+1}\,\,\,\,\,(2)}$$

where $\alpha = \frac{-a}b$, $\varepsilon_t = \frac{-e_t}b$, $\beta = \frac1b$ ?

I understand that in finite samples these equations usually don't hold (for the estimates). However this doesn't answer the question of why the former is more legitimate. Also, the application isn't forecasting, so I don't need the ease of being able to plug in $X_t$ into $(1)$ and arriving at $\text{forecast}(Y_{t+1})$ with no effort.

This question occurred to me when I was reading an empirical paper. They had variables $(A,B,Y)$ and they thought that $A,B \overset{\text{cause}}{\implies} Y$. Specifically they thought a good approximation of the relationship was:

$$\boxed{Y_{t+1} = a + b(A_t - B_t) + e_{t+1}.\,\,\,\,\,(3)}$$

However they had theoretical reasons to think that there is a break in $b$ when $(A_t - B_t)$ was "large" versus a "small" $(A_t - B_t)$. They then went on a big spree of testing all sorts of different dummy variable combos. They even went on to specify a non-linear smooth transition model. However, in this spot it seems far more sensible, interpretable, easier (and probably more statistically powerful) to just do:

$$\boxed{Q_\tau(A_t-B_t|Y_{t+1}) = \alpha(\tau) + \beta(\tau)Y_{t+1} + \varepsilon_{t+1}(\tau)\,\,\,\,\,(4)}$$

where $Q$ is the quantile function.

Is this wrong?

Related question : [Effect of switching response and explanatory variable in simple linear regression](http://stats.stackexchange.com/questions/20553/effect-of-switching-response-and-explanatory-variable-in-simple-linear-regressio) — Elvis, Dec 24 '12 at 07:08
If the answer is helpful you might consider accepting it. If not, then you might say what is still unclear. — conjugateprior, Jan 14 '13 at 17:09

conjugateprior · Accepted Answer · 2013-01-14T17:09:21.153

Distinguish two quantities that you might ask a regression to estimate:

The expected value of $Y_{t+1}$ given that you observe $X_t$. This is always estimated by the regression of $Y_{t+1}$ on $X_t$ and targets the conditional distribution $P(Y_{t+1}\mid X_t)$. When you condition on different quantities you get (correctly) different answers because you are targeting different conditional distributions.
The causal effect of changes in $X_t$ on the expected value of $Y_{t+1}$. This is only sometimes estimated by the regression of $Y_{t+1}$ on $X_t$ because it targets $P(Y_{t+1}\mid \text{do}(X_t))$ which is the expected value of $Y_{t+1}$ when you intervene to set the value of $X_t$, e.g. in an experiment where $X$ is a treatment variable. This is a stable feature of the system under study, even when quantity 1 varies, and may sometimes be identified by regression by conditioning on confounders, e.g. common causes of $X$ and $Y$. There are other strategies, but that's the one relevant to your question.

With this distinction in mind it's clear that regressing $Y_{t+1}$ on $X_t$ could be a good idea if you are interested in the first sort of quantity, and a bad idea if you are interested in the second sort of quantity. If $t$ is time then the expected value of $X_t$ when you intervene to set $Y_{t+1}$ is not obviously well-defined because it is already observed; a hypothetical 'intervention' in the future would be changing the present.

This is not just a time issue though: if $X$ causes $Y$ but not vice versa then the expected value of $X$ when you intervene to set $Y$ is simply the expected value of $X$ (and in the regression corresponding to this 'experiment' the coefficient of $Y$ is zero.) In contrast the expected value of $X$ given that you simply observe $Y$ perfectly well defined (indeed for real valued variables, the slopes of the two corresponding regressions $Y \mid X$ and $X \mid Y$ will be functions of the same correlation coefficient).

In short, it's perfectly legitimate to respecify things any way you like, including all the ways you mention, provided you are interested only in the first type of quantity. What is potentially confusing is that although the two quantities are quite different you might use regression to estimate both of them.

Why arrange variables by causality in bivariate regression?

1 Answers1