4

I am interested in estimating the effect of security S on crime C in a given city over time (eight quarters) for twenty cities, so it's panel data. The problem is, instead of actual security spending I have two imperfect measures of S, call them S1 and S2. These were collected independently from each other but in both measures some random mistakes caused mis-measurment. I know that if I use either of the two in order to proxy S, my regression coefficient will be biased towards zero because of attenuation.

Now I have two questions: why does attenuation bias lead to downward biased coefficients? Can I use S1 and S2 somehow to estimate the causal effect of S on C?

Thanks

user45086
  • 165
  • 7

1 Answers1

4

Here's the intuition for why the measurement error in a linear pooled panel model is potentially more worrisome than in the cross section case.

Suppose the true DGP is $$y_{it}=\beta x^{*}_{it}+u_{it},$$ but we actually observe $$x_{it}=x^{*}_{it} + v_{it}.$$

If we have data with more than one period, we might be worried that $x^{*}_{it}$ is positively correlated over time for a given $i.$ Call that first correlation $\rho$.

If we estimate a first-differences regression $$\Delta y_{it}=\beta \Delta x^{*}_{it} + \Delta u_{it}=\beta \Delta x_{it} + \Delta u_{it}-\beta \Delta v_{it},$$ then it can be shown that $\beta$ is biased downward:$$ \DeclareMathOperator{\plim}{plim} \plim \hat \beta = \beta - \frac{\beta \sigma^2_v}{(1-\rho)\sigma^2_{x^{*}}+\sigma^2_v}$$

This inconsistency is larger than with a cross-section as long as $\rho>0$. As $\rho \rightarrow 1$, it can get enormous. As $\rho \rightarrow 0$, you get the familiar cross sectional bias result. You can sometimes make things better if you use long differences (like year on year) rather than first differences since the correlation may be smaller. In other words, if spending does not change much, you should worry.

As far as identification, you can use instrumental variables (perhaps use one S as an IV for the other). That may work depending on the kind of measurement error process you have. Another approach might be to use the ranks rather than the levels of spending since ranks might have less error.

This does nothing to solve the main edogeneity problem between crime and expenditures: when crime increases, more police are hired to combat crime and expenditures go up. Steve Levitt has used the timing of mayoral and gubernatorial elections as an instrumental variable to identify a causal effect of police on crime. Justin McCrary showed that this results was driven by a programming error, which Levitt acknowledges, though he provides some alternative evidence for this theory. McCrary and Chalfin have a nice paper on this topic which you should read, since it deals with the measurement error very similar to yours. The IV strategy I mentioned above recovers the OLS estimate when the two mismeasurements are independent, though they provide some guidance about how to recover a causal effect.

dimitriy
  • 31,081
  • 5
  • 63
  • 138