Implausibly small standard error

Question

I have data of operation success of many doctors. I estimated a regression using Stata with fix effects on the individual doctors. I first ran the regression using robust option. The resulted t value of estimates of individual doctors ranges from 2.17 to 6.14. Then I re-ran it using the vce(cluster doctor) option. I expected the standard errors would become large. However, I indeed got smaller std. error -- much smaller, for example, 1.04e-14. It's just too good to be true. Why is that? Any possible reason?

1

provide your code and output – StasK May 16 '14 at 06:33

score 10 · Answer 1 · answered May 16 '14 at 06:36

You have way overcorrected the individual doctor effects twice using methods that simply do not work together.

If your model is regress outcome i.doctor, vce(cluster doctor), then Stata should have complained that you've exhausted your degrees of freedom. xtreg may not be as smart, and may miss a perfect determination of the fixed effects. These 1e-14 standard errors should have been identically zero, and they are non-zero in practice due to rounding somewhere in the guts of fixed effect estimation. What happens here is this:

cluster variance estimation works by summing up the cluster contributions, over clusters. However,
by specifying doctors as fixed effects, you force the residuals for a given doctor to sum up to 0.
regress knows how to determine this at the level of algebra. xtreg may not know enough of computational linear algebra to do this, though, and simply sums up the (numerical) zero contributions to produce the implausibly small standard errors that you see here.

Thank both StasK and Dimitriy for the great help. To clarify, I did have other variables in my regression, i.e., " regress outcome weekly_dummies, regime, i.doctor, vce(cluster doctor)" in which weekly_dummies are dummies for weeks, and regime is my main variable of interest (There is a policy switch in my data). I didn't get the complain from regress as StasK advised. Furthermore, could you please clarify how to remedy my issue? I am still confused especially I used the similar specification on other data and there was no such an issue. Is this data specific? Many thanks again. — user44968, May 17 '14 at 16:47
@StasK, could we then use i.hh and robust s.e., instead of culster(hh)? (I am asking this to Dimitriy as well, above--I am trying my chances with this old but super-useful post) — Fuca26, May 06 '21 at 20:58

dimitriy · Answer 2 · 2021-05-06T21:13:03.860

If I understand your problem, this can happen when the intra-cluster correlations are negative. See Stata FAQ for the therapist version with some intuition.

Edit:

I think Stas is right about the deeper issue. I was too hasty. Here's my attempt to replicate this with a dataset of pharmacy visits by 27,766 Vietnamese villagers that are nested in 5,740 households in 194 villages (data are from Cameron and Trivedi). I could not find a public dataset where the clustered errors were smaller, but I think this illustrates the main point. I will treat pharmacy visits as continuous, though they clearly are not.

First, we set up the data:

. use "http://cameron.econ.ucdavis.edu/mmabook/vietnam_ex2.dta", clear

. egen hh=group(lnhhinc)
(1 missing value generated)

. bys hh: gen person = _n

. xtset hh person
       panel variable:  hh (unbalanced)
        time variable:  person, 1 to 19
                delta:  1 unit

. xtdes

      hh:  1, 2, ..., 5740                                   n =       5740
  person:  1, 2, ..., 19                                     T =         19
           Delta(person) = 1 unit
           Span(person)  = 19 periods
           (hh*person uniquely identifies each observation)

Distribution of T_i:   min      5%     25%       50%       75%     95%     max
                         1       2       4         5         6       8      19

(snip)

Now for the FE regression of visits on days sick:

. xtreg PHARVIS ILLDAYS, fe

Fixed-effects (within) regression               Number of obs      =     27765
Group variable: hh                              Number of groups   =      5740

R-sq:  within  = 0.1145                         Obs per group: min =         1
       between = 0.1390                                        avg =       4.8
       overall = 0.1257                                        max =        19

                                                F(1,22024)         =   2848.23
corr(u_i, Xb)  = 0.0465                         Prob > F           =    0.0000

------------------------------------------------------------------------------
     PHARVIS |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
     ILLDAYS |   .0788618   .0014777    53.37   0.000     .0759654    .0817581
       _cons |   .2906284   .0077221    37.64   0.000     .2754925    .3057643
-------------+----------------------------------------------------------------
     sigma_u |  .85814688
     sigma_e |   1.085808
         rho |  .38447214   (fraction of variance due to u_i)
------------------------------------------------------------------------------
F test that all u_i=0:     F(5739, 22024) =     2.35         Prob > F = 0.0000

Clustering on the panel variable inflates the errors:

. xtreg PHARVIS ILLDAYS, fe vce(cluster hh)

Fixed-effects (within) regression               Number of obs      =     27765
Group variable: hh                              Number of groups   =      5740

R-sq:  within  = 0.1145                         Obs per group: min =         1
       between = 0.1390                                        avg =       4.8
       overall = 0.1257                                        max =        19

                                                F(1,5739)          =    464.54
corr(u_i, Xb)  = 0.0465                         Prob > F           =    0.0000

                                  (Std. Err. adjusted for 5740 clusters in hh)
------------------------------------------------------------------------------
             |               Robust
     PHARVIS |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
     ILLDAYS |   .0788618   .0036589    21.55   0.000     .0716889    .0860346
       _cons |   .2906284   .0102597    28.33   0.000     .2705154    .3107413
-------------+----------------------------------------------------------------
     sigma_u |  .85814688
     sigma_e |   1.085808
         rho |  .38447214   (fraction of variance due to u_i)
------------------------------------------------------------------------------

Now I try this a non-panel approach. I am using areg since Stata won't let me put in ~6K dummies.

. areg PHARVIS ILLDAYS, absorb(hh) vce(cluster hh)

Linear regression, absorbing indicators           Number of obs   =      27765
                                                  F(   1,   5739) =     368.52
                                                  Prob > F        =     0.0000
                                                  R-squared       =     0.4579
                                                  Adj R-squared   =     0.3166
                                                  Root MSE        =     1.0858

                                  (Std. Err. adjusted for 5740 clusters in hh)
------------------------------------------------------------------------------
             |               Robust
     PHARVIS |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
     ILLDAYS |   .0788618   .0041081    19.20   0.000     .0708084    .0869151
       _cons |   .2906284   .0115192    25.23   0.000     .2680464    .3132103
-------------+----------------------------------------------------------------
          hh |   absorbed                                    (5740 categories)

Unfortunately, areg obscures the thing you are interested in. If you use regress and limit the sample so the number of HHs is reasonable, you will get the tiny standard errors for clusters with only 1 villager. This makes sense since the residual for such observations will be exactly zero. Here's an example:

. reg PHARVIS ILLDAYS i.hh if inrange(hh,1,100), cluster(hh)

Linear regression                                      Number of obs =     219
                                                       F(  0,    99) =       .
                                                       Prob > F      =       .
                                                       R-squared     =  0.6473
                                                       Root MSE      =  .88177

                                   (Std. Err. adjusted for 100 clusters in hh)
------------------------------------------------------------------------------
             |               Robust
     PHARVIS |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
     ILLDAYS |   .0518095   .0314707     1.65   0.103    -.0106352    .1142542
             |
          hh |
          2  |         -1   1.84e-14 -5.4e+13   0.000           -1          -1
          3  |   .2590475   .1573536     1.65   0.103    -.0531762    .5712712
          4  |   .4662855   .2832365     1.65   0.103    -.0957171    1.028288
          5  |   2.129524   .0786768    27.07   0.000     1.973412    2.285636
          6  |          1   1.84e-14  5.4e+13   0.000            1           1
          7  |   -.585524   .2517657    -2.33   0.022    -1.085082   -.0859662
        (snip)....
        100  |  -.8359366   .0996573    -8.39   0.000    -1.033678   -.6381949
             |
       _cons |    .481905   .3147072     1.53   0.129    -.1425423    1.106352
------------------------------------------------------------------------------

Now I will cluster on the village, which inflates them some, as is expected, but still OK:

. reg PHARVIS ILLDAYS i.commune, cluster(commune)

Linear regression                                      Number of obs =   27765
                                                       F(  0,   193) =       .
                                                       Prob > F      =       .
                                                       R-squared     =  0.1814
                                                       Root MSE      =  1.1925

                              (Std. Err. adjusted for 194 clusters in commune)
------------------------------------------------------------------------------
             |               Robust
     PHARVIS |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
     ILLDAYS |   .0840634   .0056375    14.91   0.000     .0729444    .0951823
             |
     commune |
          2  |  -.1885549    .012027   -15.68   0.000    -.2122761   -.1648337
        (snip) ....
        191  |   .4646775   .0014571   318.91   0.000     .4618037    .4675514
        192  |  -.0020317   .0065782    -0.31   0.758    -.0150061    .0109427
        193  |  -.2444578   .0115522   -21.16   0.000    -.2672426   -.2216731
        194  |   .1917803   .0002288   838.33   0.000     .1913291    .1922315
             |
       _cons |   .4371527   .0200739    21.78   0.000     .3975602    .4767452
------------------------------------------------------------------------------

If I drop all other regressors and estimate something like Stas suggests, I get the zero standard errors on the commune dummies:

. reg PHARVIS i.commune, cluster(commune)

Linear regression                                      Number of obs =   27765
                                                       F(  0,   193) =       .
                                                       Prob > F      =       .
                                                       R-squared     =  0.0656
                                                       Root MSE      =   1.274

                              (Std. Err. adjusted for 194 clusters in commune)
------------------------------------------------------------------------------
             |               Robust
     PHARVIS |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
     commune |
          2  |  -.0092138   1.72e-14 -5.4e+11   0.000    -.0092138   -.0092138
          3  |  -.2910319   1.72e-14 -1.7e+13   0.000    -.2910319   -.2910319
          4  |  -.3957457   1.72e-14 -2.3e+13   0.000    -.3957457   -.3957457
          5  |  -.4244865   1.72e-14 -2.5e+13   0.000    -.4244865   -.4244865
        (snip) ....
        191  |   .4864051   1.72e-14  2.8e+13   0.000     .4864051    .4864051
        192  |  -.1001229   1.72e-14 -5.8e+12   0.000    -.1001229   -.1001229
        193  |   -.416719   1.72e-14 -2.4e+13   0.000     -.416719    -.416719
        194  |    .188369   1.72e-14  1.1e+13   0.000      .188369     .188369
             |
       _cons |   .7364865   1.72e-14  4.3e+13   0.000     .7364865    .7364865
------------------------------------------------------------------------------

With negative ICCs, the standard errors can get a little bit smaller, but not numerically zero. — StasK, May 16 '14 at 06:34
I don't see why this answer was down voted. Obviously some respectable amount of work went into it. — Andy, May 16 '14 at 23:31
Dimitriy, thank you so much for the replicate. Could you please explain a bit on what you mean by "This makes sense since the residual for such observations will be exactly zero." in "reg PHARVIS ILLDAYS i.hh if inrange(hh,1,100), cluster(hh)" — user44968, May 17 '14 at 16:59
The hh dummy will be equal to the error for that singleton observation. You can see this clearly if you run a regression of y on x and a dummy for observation 7. The prediction will be equal to the actual. — dimitriy, May 17 '14 at 17:39
@DimitriyV.Masterov could we then use i.hh and robust s.e., instead of culster(hh)? — Fuca26, May 06 '21 at 20:56
@Fuca26 That may fix one problem, but if there is some correlation in the errors for the same observation on different days, the robust SEs will be wrong. This is usually the case in social science settings. Moreover, the singleton dummy issue will persist. — dimitriy, May 06 '21 at 21:17

Implausibly small standard error

2 Answers2

Linked