I am trying to estimate robust standard errors in a panel data regression. I understand panel data regressions conceptually, but R offers a lot of options I am not sure about. My data is of the following format:
id time name y x1 x2
1 10 A 1.28233854 -0.42411039 1.89640596
1 11 A -0.59541995 -0.43214374 0.07386285
1 12 A 0.88951720 -1.55417836 0.28276157
2 10 B 1.11211744 -0.89200195 0.88989664
2 11 B -0.37737953 0.09055494 1.20764357
3 10 C 0.03258314 -0.13834344 -0.97812765
3 11 C -0.97645525 -0.14313482 -1.03528695
3 12 C -0.02031554 0.02061293 -0.71353867
Here is the R code to create the data:
x <- data.frame(id = rep(c(1, 2, 3), c(3,2,3)), time = c(10,11,12,10,11,10,11,12),name= rep(c("A", "B", "C"), c(3,2,3)), y = rnorm(8), x1 = rnorm(8), x2 = rnorm(8))
In order to perform the regression and the robust standard errors, I use:
library(plm)
library(sandwich)
library(lmtest)
attach(x)
# Pooling:
r1 <- plm(y ~ x1 + x2, model="pooling", x, index = c("id","time"))
r1
coeftest(r1,vcov=vcovHC(r1,type="HC0",cluster="group"))
# Fixed effects:
r2 <- plm(y ~ x1 + x2, model="within", x, index = c("id","time"))
r2
coeftest(r2,vcov=vcovHC(r2,type="HC0",cluster="group"))
detach(x)
My questions are the following:
1) Is it correct to cluster by group in the pooling model and in the fixed effects model? I could also cluster by time. My issue is that in the fixed effects model we only account for the within-variation over time, so as I understand, it wouldn't make any sense to cluster the standard errors by group under this approach.
2) There are 3 options to choose an effect, "individual", "time" or "twoways". But I could not find any good explanation which effect to use under which model. Maybe someone could tell me which effect to use in the above simple model, in either the within- or the pooling model.