The plm
function of the plm
library in R is giving me grief over having duplicate time-id couples, even when I'm running a model that I don't think should need a time variable at all (see reproducible example below).
I can think of three possibilities:
- My understanding of fixed effects regression is wrong, and they really do require unique time indices (or time indices at all!).
- plm() is just being overly-finicky here and should relax this requirement.
- The particular estimation technique that plm() uses--the within transformation--requires time indices, even though the order doesn't seem to matter and the less computationally-efficient version (including dummies in a straight-up OLS model) doesn't need them.
Any thoughts?
set.seed(1)
n <- 1000
test <- data.frame( grp = as.factor(rep( letters, (n/length(letters))+1 ))[seq(n)], x = runif(n), z = runif(n) )
test$y <- with( test, 2*x + 3*z + rnorm(n) )
lm( y ~ x + z, data = test )
lm( y ~ x + z + grp, data = test )
require(plm)
# Model fails if I don't specify a time index, despite effect = "individual"
plm( y ~ x + z, data = test, model = "within", effect="individual", index = "grp" )
# Create time variable and add it to the index but still specify individual FE not time FE also
library(plyr)
test <- ddply( test, .(grp), function(dat) transform( dat, t = seq(nrow(dat)) ) )
# Now plm() works; note coefficients clearly include the fixed effects, as they match the lm() version above
plm( y ~ x + z, data = test, model = "within", effect="individual", index = c("grp","t") )
# Scramble time variables and show they don't matter as long as they're unique within a cluster
test <- ddply( test, .(grp), function(dat) transform( dat, t = sample(t) ) )
plm( y ~ x + z, data = test, model = "within", effect="individual", index = c("grp","t") )
# Add a duplicate time entry and show that it causes plm() to fail
test[ 2, "t" ] <- test[ 1, "t" ]
plm( y ~ x + z, data = test, model = "within", effect="individual", index = c("grp","t") )
Why this matters
I'm trying to bootstrap my model, and when I do the requirement that the index-time pairs be unique is causing headaches which seem unnecessary if (2) is true.