Covariance, bernoulli distribution and instrumental variables

Question

my problem is from the following book on page 94 (http://www.development.wne.uw.edu.pl/uploads/Main/recrut_econometrics.pdf).

They say the covariance between a single dummy instrument variable (z), which is one with probability p, and the dependent variable (y) is: Cov(y,z) = (E(y I z=1) - E(y I z=0))p(1-p). They say this is easy to show, but i cant figure it out. I tried using the usual covariance formulas, but it didnt help. Does anybody know an answer?

Although it does take some algebra when starting with standard formulas, it follows directly from the characterization I gave at https://stats.stackexchange.com/a/18200/919. That's because the only rectangles with nonzero area must have one vertex at $Z=0$, with probability $1-p$, and the other at $Z=1$, with probability $p$; and conditional on that, you just need to find the expectation of the rectangle's area, which is the average value of $Y$ when $Z=1$ minus the average value of $Y$ when $Z=0$. That, at least, shows how the terms in this formula arise. — whuber, Jan 12 '18 at 20:35

score 7 · Answer 1 · answered Jan 12 '18 at 20:56

Consider the ordinary least squares fit of $Y$ to $Z$:

Because the mean of univariate data minimizes the sum of squared residuals, this fit must rise from the mean of the $Y$ values associated with $Z=0$ (left hand red point) to the mean of the $Y$ values associated with $Z=1$ (right hand red point). Since $Z$ changes by $1-0=1$, its slope $\beta$ is the difference of these means:

$$\beta = (E[Y\mid Z=1] - E[Y\mid Z=0]).$$

This formula holds whether the variables refer to data or to a bivariate distribution.

However, the usual formula for the slope asserts it equals the covariance of $(Z,Y)$ divided by the variance of $Z$:

$$\beta = \frac{\operatorname{cov}(Y,Z) }{ \operatorname{Var}(Z) }.$$

When $Z$ is Bernoulli$(p)$, its variance is $p(1-p)$. Solving for the covariance in terms of the slope and the variance of $Z$ gives the stated formula:

$$\operatorname{Cov}(Y,Z) = \beta\, p(1-p) = (E[Y\mid Z=1] - E[Y\mid Z=0])\, p(1-p).$$

score 2 · Accepted Answer · answered Jan 13 '18 at 03:25

The standard formula does work, just needs a bit of manipulation

$$Cov(y,z) = E(yz) - E(y)E(z) = E(yz\mid z=1)P(z=1) - E(y)p$$

$$[E(y\mid z=1) - E(y)]p = \Big[E(y\mid z=1) - \big[E(y|z=1)p + E(y\mid z=0)(1-p)\big]\Big]p$$

$$=\Big[ E(y \mid z=1)(1-p) - E(y \mid z=0)(1-p) \Big]p$$

$$=\Big[ E(y \mid z=1) - E(y \mid z=0) \Big]p(1-p)$$

Covariance, bernoulli distribution and instrumental variables

2 Answers2