1

I have to do some panel regressions and because I received the data as an .dta stata file, I first ran all regressions in Stata and all went fine. Later I wanted to reproduce these regressions in R which I much prefer for several reasons. It turned out that R refused to run a fixed effects regression with both individual and time effects.

Here's some sample data:

   id year type1 type2 var1 var2
1   1 1991     1     1    2   11
2   1 1992     1     1    2   14
3   1 1993     1     1    3   13
4   1 1994     1     1    5   16
5   1 1995     1     1    6   17
6   2 1991     0     1    1   16
7   2 1992     0     1    3   16
8   2 1993     0     1    3   17
9   2 1994     0     1    5   20
10  2 1995     0     1    5   21
11  3 1991     1     0    1   11
12  3 1992     1     0    4   14
13  3 1993     1     0    4   15
14  3 1994     1     0    5   15
15  3 1995     1     0    8   19

To consider fixed and time effects in Stata, I run:

. xtreg var2 var1 type1 type2 i.year, fe
note: type1 omitted because of collinearity
note: type2 omitted because of collinearity

Fixed-effects (within) regression               Number of obs      =        15
Group variable: id                              Number of groups   =         3

R-sq:  within  = 0.9133                         Obs per group: min =         5
       between = 0.2879                                        avg =       5.0
       overall = 0.5511                                        max =         5

                                                F(5,7)             =     14.76
corr(u_i, Xb)  = -0.0492                        Prob > F           =    0.0013

------------------------------------------------------------------------------
        var2 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        var1 |   .4102564   .4298219     0.95   0.372    -.6061109    1.426624
       type1 |          0  (omitted)
       type2 |          0  (omitted)
             |
        year |
       1992  |   1.316239   1.074077     1.23   0.260    -1.223549    3.856028
       1993  |   1.512821   1.174497     1.29   0.239    -1.264424    4.290065
       1994  |    2.82906   1.767562     1.60   0.154     -1.35056    7.008679
       1995  |   4.282051   2.293279     1.87   0.104    -1.140691    9.704794
             |
       _cons |   12.11966   .8053985    15.05   0.000     10.21519    14.02412
-------------+----------------------------------------------------------------
     sigma_u |  2.1671081
     sigma_e |  .98014477
         rho |  .83017912   (fraction of variance due to u_i)
------------------------------------------------------------------------------
F test that all u_i=0:     F(2, 7) =    21.30                Prob > F = 0.0011

To run the same procedure in R with plm(), I tried:

a <- plm(var2 ~ var1 + type1 + type2, model="within", effect="twoways", data=data)

and got

summary(a)
Error in crossprod(t(X), beta) : non-conformable arguments

So, my question is: Why does R have a problem and Stata not? Is there really a problem, and if so, how does Stata deal with that?

gung - Reinstate Monica
  • 132,789
  • 81
  • 357
  • 650
ttlngr
  • 11
  • 3
  • It is ambiguous whether this is a statistical issue or a coding issue (& I don't know Stata). That said, I notice your Stata output includes `type1 omitted because of collinearity`, & `type2 omitted because of collinearity`. I wonder if that's related to the issue. Would R work if you dropped `type1` & `type2`? – gung - Reinstate Monica Dec 09 '15 at 23:47
  • 1
    Two things: you can load .dta files into R using the `foreign` library. Re: the error message: I've gotten this before. `plm` does a bad job of auto-dropping collinear dummy variables. You could try the `lfe` package, or you could just code the dummies as factors. Depending on how big your dataset is, this might not be infeasible. – generic_user Dec 10 '15 at 00:38
  • @generic_user Using factors did not help. So the actual problem is, that R can't drop the collinear variables? The stranhe thing is, that R drops them for individual as well as time effects fixed regression, but not for both. – ttlngr Dec 10 '15 at 11:24
  • @gung Yes, it works with R when dropping the dummies first. – ttlngr Dec 10 '15 at 11:26
  • @ttlngr blame the plm package, not R itself. Again, try lfe -- I've had better luck with that one. And why would making your dummies into a factor variable not work? – generic_user Dec 10 '15 at 15:35
  • Since this turns out to be an issue w/ how R (or at least this particular function) works, this thread should be closed as off-topic here. – gung - Reinstate Monica Dec 10 '15 at 16:17
  • 1
    I am usually fairly hawkish on software-specific questions but in this instance there is a statistical question exposed too that keeps the thread of interest. – Nick Cox Dec 10 '15 at 17:36

1 Answers1

1

(Turning the comments into an answer so that this thread isn't officially unanswered.)

I notice your Stata output includes type1 omitted because of collinearity, and type2 omitted because of collinearity, but the R output does not indicate anything like that. The nature of multicollinearity is software-independent: It is not possible for a model matrix to be multicollinear in one software but not another.

Regression models cannot be fit when the model matrix is multicollinear without special 'tricks' being used. The most common thing is for software to drop variables according to some pre-set scheme and return a warning (which Stata has done), or return an error so that you can choose which variables you want to drop or which other steps you want to take. Those strategies are employed in other functions in R, but this seems to be a bug / not implemented well in the plm() function. R has a bug-reporting protocol; you may want to report this.

Nick Cox
  • 48,377
  • 8
  • 110
  • 156
gung - Reinstate Monica
  • 132,789
  • 81
  • 357
  • 650
  • See http://stats.stackexchange.com/questions/16327/testing-for-linear-dependence-among-the-columns-of-a-matrix/39321#39321 for some R code how to check for linear dependend columns in a model.matrix. Also: This is not a bug in R, rather a shortcomming of the plm package which does not print a more specific warning. – Helix123 Dec 10 '15 at 16:40
  • @Helix123, I am familiar w/ linear dependence in model matricies. The discussion of bug reporting in the FAQ encompasses both bugs in R itself & in contributed packages. – gung - Reinstate Monica Dec 10 '15 at 16:46