How to decide on the clustering of standard errors? (region- vs. country-level)

Question

I'm estimating a first-difference panel data model with data on the regional level (~125 regions). All regions are part of a country (~12 countries).

It seems intuitive to cluster the standard errors, but I am not sure how to decide on clustering on the country level versus the regional level. What would be a good way to decide on this? (using Stata)

At which level do you suspect there is correlation among your errors *within* the level but not between levels? Cluster there. — paqmo, May 20 '17 at 19:56
Thanks for your response! I know that this is the criterium for the choice of the cluster level. But this is exactly the issue: I am looking for a way to find out at which level this correlation among the errors exists. How would you approach this? — Michiel Hennink, May 21 '17 at 15:14
Since the errors are unobserved and a characteristic of the underlying population, there is no straight forward trick to determine the level to cluster. Is there a reason to believe the errors are correlated at the regional level (common conditions the effect the outcome variable that differ between regions)? For example, looking at outcomes for students nested in classrooms, there are clear factors that all students within a class are exposed to that vary between classes (e.g. teachers, classroom composition) — paqmo, May 21 '17 at 15:43
The higher the level of clustering, the more conservative the estimate of the standard error, so it's good to err on the side of caution, unless there are compelling reasons to cluster at the lower level. — paqmo, May 21 '17 at 15:50
@paqmo do you mean that if you cluster at the regional level the standard errors will be larger? That's probably the case, just wanted to be sure. Not clear that conservatism is what should drive the approach. — Frank Harrell, Oct 15 '17 at 11:46
@FrankHarrell Yes, good point. I should amend that to say cluster at the highest level that you have good reason to believe errors are correlated — not just highest level for sake of conservatism! — paqmo, Oct 17 '17 at 01:22

Michael Ohlrogge · Answer 1 · 2017-10-15T15:42:55.500

Cameron Trivedi (2005) (Microeconometrics: Methods and Applications) on p.834 give a very informative description of the variance estimator when using clustered errors for a linear model:

$$\widehat{\text{var}} \left( \hat{\beta} \right)_{\text{cluster}} = \left( \sum_{c=1}^C x'_cx_c \right)^{-1} \left[ N^{-1} \sum_{c = 1}^C \sum_{j = 1}^{N_c} \sum_{k = 1}^{N_c} \hat{u}_{jc} \hat{u}_{kc} x_{jc} x'_{kc} \right] \left( \sum_{c=1}^C x'_cx_c \right)^{-1}$$

Here there are $C$ clusters, each containing $N_c$ observations. $\hat{u}_{jc}$ represents the residual of observation $j$ from cluster $c$ and $x_{jc}$ represents the $x$ variable values for that observation.

Now, compare this to the HW robust variance estimator, that does not account for clustering:

$$\widehat{\text{var}} \left( \hat{\beta} \right)_{HW} = (X'X)^{-1} \left[ \sum_{i=1}^n \hat{u}_i^2 x_ix_i' \right] (X'X)^{-1}$$

There are several instructive observations that related to your situation:

Note that if each observation had its own cluster, then the cluster estimator would be identical to the HW estimator.
Focus on the middle term (the three nested summations) in the cluster estimator. Notice that where $ j = k $ in the inner two summations, we'll be multiplying a residual by itself, so that will always be positive. Thus, we'll always have the standard sum of $ \hat{u}_{jc}^2 x_{jc}x'_{jc} $ like we do in the basic HW estimator.
If the errors within a cluster are indeed independent of each other, than we should have, by definition of that independence, $ \mathbf{E} \left[ \hat{u}_{jc} \hat{u}_{kc} \right] = 0 $ for $ j \neq k $.
Thus likewise we'd have $ \mathbf{E} \left[ \hat{u}_{jc} \hat{u}_{kc} x_{jc} x'_{kc} \right] = 0 $ for $ j \neq k $ since the $ x $ observations are considered non-random.
Thus, if all of the observations within a cluster are indeed completely independent of each other, then this estimator will be identical (at least asymptotically) to the basic HW estimator.
But, to the extent that we have $ \mathbf{E} \left[ \hat{u}_{jc} \hat{u}_{kc} \right] \neq 0 $ for $ j \neq k $ within the cluster, then the two will differ.
Generally speaking, this will mean that $ \mathbf{E} \left[ \hat{u}_{jc} \hat{u}_{kc} \right] > 0 $ - in other words, that the errors are positively correlated within the cluster. In this case, the cluster estimator will then be greater than the HW estimator, because we're adding more positive terms in the inner summation.
But, it is in theory possible to have $ \mathbf{E} \left[ \hat{u}_{jc} \hat{u}_{kc} \right] < 0 $ - i.e. negative correlation of errors within a cluster. In this case then, the cluster variance estimator would be less than the basic HW one.

So, what does this mean for your situation in particular? As one of the comments to your question noted, clustering at the larger level is more conservative (and likely better). This allows for the possibility of correlations between the errors in the observations within a given country, since the residuals from all the observations within a country will get multiplied by the residuals of all the other observations in that country. To the extent that those residuals within the country actually are independent of each other (e.g. the residuals from two different regions in a country), then we'll have, at least in expectation, $ \mathbf{E} \left[ \hat{u}_{jc} \hat{u}_{kc} \right] = 0 $ for $ j \neq k $. But, to the extent that there actually is correlation across regions in a country, then you'll have $ \mathbf{E} \left[ \hat{u}_{jc} \hat{u}_{kc} \right] \neq 0 $ for $ j \neq k $. In that case, those terms will add to the variance of your estimator, but that is the appropriate thing in such a situation, since if your observations and errors are correlated, then you don't actually have as many independent observations as your simple sample size would indicate. As they say, the innocent (i.e. genuinely independent observations) have nothing to hide.

Please also cite primary sources. See http://www.citeulike.org/user/harrelfe/article/13265032 for the cluster sandwich covariance estimator. — Frank Harrell, Oct 18 '17 at 12:39

How to decide on the clustering of standard errors? (region- vs. country-level)

1 Answers1

Linked