4

my problem is from the following book on page 94 (http://www.development.wne.uw.edu.pl/uploads/Main/recrut_econometrics.pdf).

They say the covariance between a single dummy instrument variable (z), which is one with probability p, and the dependent variable (y) is: Cov(y,z) = (E(y I z=1) - E(y I z=0))p(1-p). They say this is easy to show, but i cant figure it out. I tried using the usual covariance formulas, but it didnt help. Does anybody know an answer?

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
leo
  • 123
  • 2
  • 10
  • Although it does take some algebra when starting with standard formulas, it follows directly from the characterization I gave at https://stats.stackexchange.com/a/18200/919. That's because the only rectangles with nonzero area must have one vertex at $Z=0$, with probability $1-p$, and the other at $Z=1$, with probability $p$; and conditional on that, you just need to find the expectation of the rectangle's area, which is the average value of $Y$ when $Z=1$ minus the average value of $Y$ when $Z=0$. That, at least, shows how the terms in this formula arise. – whuber Jan 12 '18 at 20:35

2 Answers2

7

Consider the ordinary least squares fit of $Y$ to $Z$:

Scatterplot with line

Because the mean of univariate data minimizes the sum of squared residuals, this fit must rise from the mean of the $Y$ values associated with $Z=0$ (left hand red point) to the mean of the $Y$ values associated with $Z=1$ (right hand red point). Since $Z$ changes by $1-0=1$, its slope $\beta$ is the difference of these means:

$$\beta = (E[Y\mid Z=1] - E[Y\mid Z=0]).$$

This formula holds whether the variables refer to data or to a bivariate distribution.

However, the usual formula for the slope asserts it equals the covariance of $(Z,Y)$ divided by the variance of $Z$:

$$\beta = \frac{\operatorname{cov}(Y,Z) }{ \operatorname{Var}(Z) }.$$

When $Z$ is Bernoulli$(p)$, its variance is $p(1-p)$. Solving for the covariance in terms of the slope and the variance of $Z$ gives the stated formula:

$$\operatorname{Cov}(Y,Z) = \beta\, p(1-p) = (E[Y\mid Z=1] - E[Y\mid Z=0])\, p(1-p).$$

whuber
  • 281,159
  • 54
  • 637
  • 1,101
2

The standard formula does work, just needs a bit of manipulation

$$Cov(y,z) = E(yz) - E(y)E(z) = E(yz\mid z=1)P(z=1) - E(y)p$$

$$[E(y\mid z=1) - E(y)]p = \Big[E(y\mid z=1) - \big[E(y|z=1)p + E(y\mid z=0)(1-p)\big]\Big]p$$

$$=\Big[ E(y \mid z=1)(1-p) - E(y \mid z=0)(1-p) \Big]p$$

$$=\Big[ E(y \mid z=1) - E(y \mid z=0) \Big]p(1-p)$$

Alecos Papadopoulos
  • 52,923
  • 5
  • 131
  • 241