Suppose we have variables $(X,Y)$ and we have theory tell us that $X$ $\overset{\text{cause}}{\implies} Y$. Perhaps they're time-series variables and it would be common to see something like this:
$$\boxed{Y_{t+1} = a + b X_{t} + e_{t+1}.\,\,\,\,\,(1)}$$
My question is why it's not just as legitimate to specify:
$$\boxed{X_t = \alpha + \beta Y_{t+1}+\varepsilon_{t+1}\,\,\,\,\,(2)}$$
where $\alpha = \frac{-a}b$, $\varepsilon_t = \frac{-e_t}b$, $\beta = \frac1b$ ?
I understand that in finite samples these equations usually don't hold (for the estimates). However this doesn't answer the question of why the former is more legitimate. Also, the application isn't forecasting, so I don't need the ease of being able to plug in $X_t$ into $(1)$ and arriving at $\text{forecast}(Y_{t+1})$ with no effort.
This question occurred to me when I was reading an empirical paper. They had variables $(A,B,Y)$ and they thought that $A,B \overset{\text{cause}}{\implies} Y$. Specifically they thought a good approximation of the relationship was:
$$\boxed{Y_{t+1} = a + b(A_t - B_t) + e_{t+1}.\,\,\,\,\,(3)}$$
However they had theoretical reasons to think that there is a break in $b$ when $(A_t - B_t)$ was "large" versus a "small" $(A_t - B_t)$. They then went on a big spree of testing all sorts of different dummy variable combos. They even went on to specify a non-linear smooth transition model. However, in this spot it seems far more sensible, interpretable, easier (and probably more statistically powerful) to just do:
$$\boxed{Q_\tau(A_t-B_t|Y_{t+1}) = \alpha(\tau) + \beta(\tau)Y_{t+1} + \varepsilon_{t+1}(\tau)\,\,\,\,\,(4)}$$
where $Q$ is the quantile function.
Is this wrong?