I have data of operation success of many doctors. I estimated a regression using Stata with fix effects on the individual doctors. I first ran the regression using robust option. The resulted t value of estimates of individual doctors ranges from 2.17 to 6.14. Then I re-ran it using the vce(cluster doctor) option. I expected the standard errors would become large. However, I indeed got smaller std. error -- much smaller, for example, 1.04e-14. It's just too good to be true. Why is that? Any possible reason?
-
1provide your code and output – StasK May 16 '14 at 06:33
2 Answers
You have way overcorrected the individual doctor effects twice using methods that simply do not work together.
If your model is regress outcome i.doctor, vce(cluster doctor)
, then Stata should have complained that you've exhausted your degrees of freedom. xtreg
may not be as smart, and may miss a perfect determination of the fixed effects. These 1e-14
standard errors should have been identically zero, and they are non-zero in practice due to rounding somewhere in the guts of fixed effect estimation. What happens here is this:
cluster
variance estimation works by summing up the cluster contributions, over clusters. However,- by specifying doctors as fixed effects, you force the residuals for a given doctor to sum up to 0.
regress
knows how to determine this at the level of algebra.xtreg
may not know enough of computational linear algebra to do this, though, and simply sums up the (numerical) zero contributions to produce the implausibly small standard errors that you see here.

- 29,235
- 2
- 80
- 165
-
Thank both StasK and Dimitriy for the great help. To clarify, I did have other variables in my regression, i.e., " regress outcome weekly_dummies, regime, i.doctor, vce(cluster doctor)" in which weekly_dummies are dummies for weeks, and regime is my main variable of interest (There is a policy switch in my data). I didn't get the complain from regress as StasK advised. Furthermore, could you please clarify how to remedy my issue? I am still confused especially I used the similar specification on other data and there was no such an issue. Is this data specific? Many thanks again. – user44968 May 17 '14 at 16:47
-
@StasK, could we then use i.hh and robust s.e., instead of culster(hh)? (I am asking this to Dimitriy as well, above--I am trying my chances with this old but super-useful post) – Fuca26 May 06 '21 at 20:58
If I understand your problem, this can happen when the intra-cluster correlations are negative. See Stata FAQ for the therapist version with some intuition.
Edit:
I think Stas is right about the deeper issue. I was too hasty. Here's my attempt to replicate this with a dataset of pharmacy visits by 27,766 Vietnamese villagers that are nested in 5,740 households in 194 villages (data are from Cameron and Trivedi). I could not find a public dataset where the clustered errors were smaller, but I think this illustrates the main point. I will treat pharmacy visits as continuous, though they clearly are not.
First, we set up the data:
. use "http://cameron.econ.ucdavis.edu/mmabook/vietnam_ex2.dta", clear
. egen hh=group(lnhhinc)
(1 missing value generated)
. bys hh: gen person = _n
. xtset hh person
panel variable: hh (unbalanced)
time variable: person, 1 to 19
delta: 1 unit
. xtdes
hh: 1, 2, ..., 5740 n = 5740
person: 1, 2, ..., 19 T = 19
Delta(person) = 1 unit
Span(person) = 19 periods
(hh*person uniquely identifies each observation)
Distribution of T_i: min 5% 25% 50% 75% 95% max
1 2 4 5 6 8 19
(snip)
Now for the FE regression of visits on days sick:
. xtreg PHARVIS ILLDAYS, fe
Fixed-effects (within) regression Number of obs = 27765
Group variable: hh Number of groups = 5740
R-sq: within = 0.1145 Obs per group: min = 1
between = 0.1390 avg = 4.8
overall = 0.1257 max = 19
F(1,22024) = 2848.23
corr(u_i, Xb) = 0.0465 Prob > F = 0.0000
------------------------------------------------------------------------------
PHARVIS | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
ILLDAYS | .0788618 .0014777 53.37 0.000 .0759654 .0817581
_cons | .2906284 .0077221 37.64 0.000 .2754925 .3057643
-------------+----------------------------------------------------------------
sigma_u | .85814688
sigma_e | 1.085808
rho | .38447214 (fraction of variance due to u_i)
------------------------------------------------------------------------------
F test that all u_i=0: F(5739, 22024) = 2.35 Prob > F = 0.0000
Clustering on the panel variable inflates the errors:
. xtreg PHARVIS ILLDAYS, fe vce(cluster hh)
Fixed-effects (within) regression Number of obs = 27765
Group variable: hh Number of groups = 5740
R-sq: within = 0.1145 Obs per group: min = 1
between = 0.1390 avg = 4.8
overall = 0.1257 max = 19
F(1,5739) = 464.54
corr(u_i, Xb) = 0.0465 Prob > F = 0.0000
(Std. Err. adjusted for 5740 clusters in hh)
------------------------------------------------------------------------------
| Robust
PHARVIS | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
ILLDAYS | .0788618 .0036589 21.55 0.000 .0716889 .0860346
_cons | .2906284 .0102597 28.33 0.000 .2705154 .3107413
-------------+----------------------------------------------------------------
sigma_u | .85814688
sigma_e | 1.085808
rho | .38447214 (fraction of variance due to u_i)
------------------------------------------------------------------------------
Now I try this a non-panel approach. I am using areg
since Stata won't let me put in ~6K dummies.
. areg PHARVIS ILLDAYS, absorb(hh) vce(cluster hh)
Linear regression, absorbing indicators Number of obs = 27765
F( 1, 5739) = 368.52
Prob > F = 0.0000
R-squared = 0.4579
Adj R-squared = 0.3166
Root MSE = 1.0858
(Std. Err. adjusted for 5740 clusters in hh)
------------------------------------------------------------------------------
| Robust
PHARVIS | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
ILLDAYS | .0788618 .0041081 19.20 0.000 .0708084 .0869151
_cons | .2906284 .0115192 25.23 0.000 .2680464 .3132103
-------------+----------------------------------------------------------------
hh | absorbed (5740 categories)
Unfortunately, areg
obscures the thing you are interested in. If you use regress
and limit the sample so the number of HHs is reasonable, you will get the tiny standard errors for clusters with only 1 villager. This makes sense since the residual for such observations will be exactly zero. Here's an example:
. reg PHARVIS ILLDAYS i.hh if inrange(hh,1,100), cluster(hh)
Linear regression Number of obs = 219
F( 0, 99) = .
Prob > F = .
R-squared = 0.6473
Root MSE = .88177
(Std. Err. adjusted for 100 clusters in hh)
------------------------------------------------------------------------------
| Robust
PHARVIS | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
ILLDAYS | .0518095 .0314707 1.65 0.103 -.0106352 .1142542
|
hh |
2 | -1 1.84e-14 -5.4e+13 0.000 -1 -1
3 | .2590475 .1573536 1.65 0.103 -.0531762 .5712712
4 | .4662855 .2832365 1.65 0.103 -.0957171 1.028288
5 | 2.129524 .0786768 27.07 0.000 1.973412 2.285636
6 | 1 1.84e-14 5.4e+13 0.000 1 1
7 | -.585524 .2517657 -2.33 0.022 -1.085082 -.0859662
(snip)....
100 | -.8359366 .0996573 -8.39 0.000 -1.033678 -.6381949
|
_cons | .481905 .3147072 1.53 0.129 -.1425423 1.106352
------------------------------------------------------------------------------
Now I will cluster on the village, which inflates them some, as is expected, but still OK:
. reg PHARVIS ILLDAYS i.commune, cluster(commune)
Linear regression Number of obs = 27765
F( 0, 193) = .
Prob > F = .
R-squared = 0.1814
Root MSE = 1.1925
(Std. Err. adjusted for 194 clusters in commune)
------------------------------------------------------------------------------
| Robust
PHARVIS | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
ILLDAYS | .0840634 .0056375 14.91 0.000 .0729444 .0951823
|
commune |
2 | -.1885549 .012027 -15.68 0.000 -.2122761 -.1648337
(snip) ....
191 | .4646775 .0014571 318.91 0.000 .4618037 .4675514
192 | -.0020317 .0065782 -0.31 0.758 -.0150061 .0109427
193 | -.2444578 .0115522 -21.16 0.000 -.2672426 -.2216731
194 | .1917803 .0002288 838.33 0.000 .1913291 .1922315
|
_cons | .4371527 .0200739 21.78 0.000 .3975602 .4767452
------------------------------------------------------------------------------
If I drop all other regressors and estimate something like Stas suggests, I get the zero standard errors on the commune dummies:
. reg PHARVIS i.commune, cluster(commune)
Linear regression Number of obs = 27765
F( 0, 193) = .
Prob > F = .
R-squared = 0.0656
Root MSE = 1.274
(Std. Err. adjusted for 194 clusters in commune)
------------------------------------------------------------------------------
| Robust
PHARVIS | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
commune |
2 | -.0092138 1.72e-14 -5.4e+11 0.000 -.0092138 -.0092138
3 | -.2910319 1.72e-14 -1.7e+13 0.000 -.2910319 -.2910319
4 | -.3957457 1.72e-14 -2.3e+13 0.000 -.3957457 -.3957457
5 | -.4244865 1.72e-14 -2.5e+13 0.000 -.4244865 -.4244865
(snip) ....
191 | .4864051 1.72e-14 2.8e+13 0.000 .4864051 .4864051
192 | -.1001229 1.72e-14 -5.8e+12 0.000 -.1001229 -.1001229
193 | -.416719 1.72e-14 -2.4e+13 0.000 -.416719 -.416719
194 | .188369 1.72e-14 1.1e+13 0.000 .188369 .188369
|
_cons | .7364865 1.72e-14 4.3e+13 0.000 .7364865 .7364865
------------------------------------------------------------------------------

- 31,081
- 5
- 63
- 138
-
With negative ICCs, the standard errors can get a little bit smaller, but not numerically zero. – StasK May 16 '14 at 06:34
-
I don't see why this answer was down voted. Obviously some respectable amount of work went into it. – Andy May 16 '14 at 23:31
-
Dimitriy, thank you so much for the replicate. Could you please explain a bit on what you mean by "This makes sense since the residual for such observations will be exactly zero." in "reg PHARVIS ILLDAYS i.hh if inrange(hh,1,100), cluster(hh)" – user44968 May 17 '14 at 16:59
-
The hh dummy will be equal to the error for that singleton observation. You can see this clearly if you run a regression of y on x and a dummy for observation 7. The prediction will be equal to the actual. – dimitriy May 17 '14 at 17:39
-
@DimitriyV.Masterov could we then use i.hh and robust s.e., instead of culster(hh)? – Fuca26 May 06 '21 at 20:56
-
2@Fuca26 That may fix one problem, but if there is some correlation in the errors for the same observation on different days, the robust SEs will be wrong. This is usually the case in social science settings. Moreover, the singleton dummy issue will persist. – dimitriy May 06 '21 at 21:17