1

I am analysing panel data model across 20 years and 55 counties. I want to perform fixed effect panel data regression. I started using lm using dummy variables and the r-squared and the adjusted r-squared were around 0.4. Then I heard about plm model, I used it and the r-squared drastically decreased, even worse adjusted r-squared became negative. See the data and the function below. Please let me know if I am doing something wrong.

My data frame:

structure(list(table.id = c("alameda, ca, 2003", "alameda, ca, 2004", 
"alameda, ca, 2005"), location = c("Alameda, CA", "Alameda, CA", 
"Alameda, CA"), year = c(2003, 2004, 2005), search.fund = c(0, 
0, 0), search.fund.binary = c(0, 0, 0), time.avg = c(0, 0, 0), 
    distance.avg = c(0, 0, 0), avg.income.capita = c(40266, 41973, 
    43594), real.gdp = c(86355025, 88443534, 90705419), unemployment = c(6.8, 
    5.9, 5.1), education.rate = c(34.9, 34.9, 34.9), urban.id = c(1, 
    1, 1), no.establishments = c(46548.75, 46623.5, 46254.25), 
    no.building.permit = c(14828, 15239, 14883), population.size = c(1454163, 
    1445721, 1441545), no.establishments.capita = c(0.0320106824338124, 
    0.0322493067472908, 0.0320865807172166), no.building.permit.capita = c(0.0101969311555857, 
    0.0105407613225512, 0.0103243395107333)), row.names = c(NA, 
3L), class = "data.frame")

lm model with dummy variables:

    sf.lm.fe.nb <- lm(search.fund ~ education.rate + unemployment + 
    urban.id + no.establishments.capita + no.building.permit.capita + 
    factor(location) + factor(year), data = df)  
summary(sf.lm.fe.nb)

plm model:

sf.plm.fe.nb  <- plm(search.fund ~ education.rate + unemployment + 
   urban.id + no.establishments.capita + no.building.permit.capita,
   data = df, model = "within", effect = "twoways", 
   index = c("location", "year"))

summary(sf.plm.fe.nb)
  • Relevant? https://stats.stackexchange.com/questions/444041/negative-adjusted-r2-in-twoway-effects-within-model/444126#444126 – Christoph Hanck Nov 17 '20 at 08:53
  • It does help me, but I still don't understand how to improve the data to increase the R squared for the plm function and if I can use the lm results as valid results? – Urban Dremelj Nov 17 '20 at 13:03
  • I have trouble running your code. Please try from a clean environment to make it a MWE. As my other answer indicates, you fit many parameters in a panel data model (individual effects, time effects) which should be counted in adjusted $R^2$. So it is possible that it is negative. I am not sure what you mean by "improve" data - you could try and collect more, if possible, but generally your data is given. – Christoph Hanck Nov 17 '20 at 14:37

0 Answers0