5

say i have following causal model:

  • outcome variable: Y (e.g. sales)
  • treatment variable: T (e.g. price)
  • covariate variable: x2 (e.g. traffic)
  • unobserved variables: U (unobserved)

causal relation:
enter image description here

how can I estimate the casual effect of T on Y which includes both T cause Y directly and T cause Y through x2? the chanllenge is that x2 may also be impacted by some other unobserved factors. is there any methodology to do this?

--update.

the below answer seems not enough. Regression Y on T alone can't remove the effect from U which is unmeasured and unobserved.
is there any method to remove impact from U?

dingx
  • 181
  • 3
  • Can you describe your data (time series, panel, cross-section, etc) and how the variation in price works in your setting (experiment, equilibrium, etc)? – dimitriy Oct 22 '20 at 14:06

2 Answers2

6

Thank you for including a causal diagram!

Answer: Simply regress $Y$ on $T$ like this: $$Y=aT+b.$$ There is no backdoor path from $T$ to $Y,$ so you don't need to condition on anything. In fact, if you want the full causal effect of $T$ on $Y,$ you need to NOT condition on $x_2.$

You have a mediation situation, so there are other numbers in which you might be interested. You can consult Causal Inference in Statistics: A Primer, by Pearl, Glymour, and Jewell, for more information on mediation.

Adrian Keister
  • 3,664
  • 5
  • 18
  • 35
  • thanks for you comment. If I only regression on T, how to remove the impact of x2(essentially U on x2) on Y? e.g. if I keep T unchanged, then change U which will change x2 and again x2 will change Y. the regression will attribute change of Y to T which is wrong since T is unchanged and the root cause should be U, right? – dingx Oct 23 '20 at 08:14
  • You could condition on $T$ to find the causal impact of $x_2$ on $Y$, then find the causal impact of $U$ on $x_2$. Perhaps subtract or divide out what you don't want? I'm thinking along the lines of a two-stage linear regression, like an instrumental variable. Is $U$ measured? – Adrian Keister Oct 23 '20 at 20:00
  • 1
    U is unmeasured and un observed. – dingx Oct 24 '20 at 11:37
  • do you have any comments? I think the answer can't remove impact from U which is un measured and unobserved? – dingx Oct 30 '20 at 03:03
  • Well, I guess I have two comments. 1. Do you really need to remove $U?$ 2. The only ways I can see to remove the effects of $U$ are one of the following: a. Measure $U$ so you can condition on it. b. Insert an instrumental variable between $U$ and $x_2$ and do 2-stage linear regression. c. Insert a variable between $x_2$ and $Y,$ thus allowing you to use the front-door adjustment formula. – Adrian Keister Oct 30 '20 at 12:36
1

To simplify, I am going to make the problem linear in parameters. You have a structural-form equation for the outcome $y$, the intermediate outcome equation for $x$, and an independence assumption:

$$ \begin{align*} y_i &=\beta_1+\beta_t \cdot t_i + \beta_x \cdot x_i + \varepsilon_i \\ x_i &= \alpha_1+\alpha_t \cdot t_i + u_i \\ (t,x) & \perp \!\!\! \perp \varepsilon \\ \end{align*}$$

Plugging the second into the first gets you the reduced-form equation for the outcome:

$$ y_i = (\beta_1 + \beta_x \cdot \alpha_1) + (\beta_t +\beta_x \cdot \alpha_t) \cdot t_i + (\beta_x \cdot u_i + \varepsilon_i) $$

You have two effects: $$\begin{align*} \text{Total Effect: }& E[y \vert t=1]-E[y \vert t=0] = \beta_t +\beta_x \cdot \alpha_t \\ \text{Direct Effect: }& E[y \vert t=1,w]-E[y \vert t=0, w] = \beta_t \\ \end{align*}$$

You can use the reduced-form outcome equation to estimate the first, and you can use the structural-form equation to estimate the second. A difference of the two recovers the indirect effect.

Here's a toy example using Stata where the indirect effect dominates:

. clear

. sysuse auto, clear
(1978 Automobile Data)

. quietly reg price i.foreign

. estimates store rf

. quietly reg price i.foreign c.mpg

. estimates store sf

. suest rf sf

Simultaneous results for rf, sf

                                                Number of obs     =         74

------------------------------------------------------------------------------
             |               Robust
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
rf_mean      |
     foreign |
    Foreign  |   312.2587   696.9581     0.45   0.654    -1053.754    1678.271
       _cons |   6072.423   428.2447    14.18   0.000     5233.079    6911.767
-------------+----------------------------------------------------------------
rf_lnvar     |
       _cons |    15.9902   .2260545    70.74   0.000     15.54714    16.43325
-------------+----------------------------------------------------------------
sf_mean      |
     foreign |
    Foreign  |   1767.292   599.3555     2.95   0.003     592.5771    2942.007
         mpg |  -294.1955   59.50419    -4.94   0.000    -410.8216   -177.5695
       _cons |   11905.42   1343.753     8.86   0.000     9271.709    14539.12
-------------+----------------------------------------------------------------
sf_lnvar     |
       _cons |    15.6727   .2476991    63.27   0.000     15.18722    16.15818
------------------------------------------------------------------------------

. nlcom indirect_effect:[rf_mean]_b[1.foreign] - [sf_mean]_b[1.foreign]

indirect_e~t:  [rf_mean]_b[1.foreign] - [sf_mean]_b[1.foreign]

---------------------------------------------------------------------------------
                |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
----------------+----------------------------------------------------------------
indirect_effect |  -1455.034   488.1763    -2.98   0.003    -2411.841   -498.2255
---------------------------------------------------------------------------------

If you don't care about the standard errors, this can be done with two separate regressions rather than Seemingly Unrelated Estimation.

dimitriy
  • 31,081
  • 5
  • 63
  • 138
  • Don't you mean $(t,x)$ independent on $\epsilon$ and not $(d,x)$. And do you not also want to add $u$ such that $(t,x)$ is independent also of $u$. Otherwise there is a problem in the reduced form. – Jesper for President Dec 08 '20 at 20:53
  • @JesperforPresident You are right on the first point. The second is implicit in the DAG, if I am not mistaken. – dimitriy Dec 08 '20 at 22:07
  • Yes I agree it is implicit in the DAG. It was only because in the comments above there is a discussion on whether it is necessary to "do something about the influence of U". Obviously it is not and I think that a virtue of your answer is that your reduced form shows this very clearly as soon as one sees that $u$ is independent of $t$ and that the standard conditions for OLS estimation is then satisfied. It was only to make it completely explicit given the apparent confusion expressed in comments above. – Jesper for President Dec 08 '20 at 22:15
  • '+1' for adding the reduced form. – Jesper for President Dec 08 '20 at 22:15