3

Suppose I have continuous random variables $X,Y,Z$ with the following causal structure:

                                                      

I hypothesize a simple regression model for each r.v., specifically,

\begin{aligned}[l] Y &= a_1 X + \cal{N}(\mu_1,\sigma_1^2),\\ Z &= a_2 X + a_3 Y + \cal{N}(\mu_2,\sigma_2^2) \end{aligned}

I have many observations sampled from the joint distribution $(X,Y,Z)$ and would like to infer the parameters. I am particularly interested in inferring $a_3$, i.e., the link from $Y$ to $Z$.

What method is appropriate for this? I could imagine using multilinear approximation to fit a regression estimate $Z \sim \alpha_1 X + \alpha_2 Y + \beta$, and then using $\alpha_2$ as my estimate for $a_3$; is this a good approach?

MarianD
  • 1,493
  • 2
  • 8
  • 17
D.W.
  • 5,892
  • 2
  • 39
  • 60

1 Answers1

4

Based on your DAG, and under linearity, we have a SEM with two structural equations:

$Z = \alpha_3 Y + \alpha_2 X + \epsilon_1$

$Y = \alpha_1 X + \epsilon_2$

Here $\alpha_{1/2/3}$ are the direct causal effects.

Now, we can see that

$Z = \alpha_3 \alpha_1 X + \alpha_2 X + \alpha_3 \epsilon_2 + \epsilon_1 = \alpha_4 X + \epsilon_3$

where $\alpha_4 = \alpha_3 \alpha_1 + \alpha_2$ represents the total causal effect of $X$ on $Z$

and $\epsilon_3 = \alpha_3 \epsilon_2 + \epsilon_1 $

Now I add some needed (causal) assumptions more. In the initial two structural equations the structural errors are exogenous ($E[\epsilon_1 | Y, X]=0$ and $E[\epsilon_2 | X]=0$) and them are independent.

So, as consequence, in the last structural equations the structural error $\epsilon_3$ is exogenous too ($E[\epsilon_3 | X]=0$)

Then, you can perform three useful regressions

$Z = \theta_1 X + u_1$

$Y = \theta_2 X + u_2$

$Z = \theta_3 Y + \theta_4 X + u_3$

here $\theta_1$ identify $\alpha_4$, $\theta_2$ identify $\alpha_1$, $\theta_3$ identify $\alpha_3$ (what you looking for) and $\theta_4$ identify $\alpha_2$.

Note that not all regressions are "good". For example if we run this regression

$Z = \theta_5 Y + u_4$

the coefficient $\theta_5$ do not identify any parameter of the SEM. Indeed $\theta_5$ is biased for $\alpha_3$ ($X$ play as omitted/confounder variable).

markowitz
  • 3,964
  • 1
  • 13
  • 28