I know there are many similar questions like this one, which I have consulted (e.g Panel Data: Pooled OLS vs. RE vs. FE Effects ; Pooled OLS vs RE and FE ; Difference between OLS and FE) . Yet I haven't been able to answer my question from this.
I have a panel data set of administrative agents pay. I have six years of average monthly pay (thus one observation per year per agent). The panel is unbalanced: some agents are here for 6 years, some only 2 etc.
I am trying to understand the difference of pay between agents based on a categorial variable that does not vary over time. I have a couple of control variables (gender, age-categories, admin category etc.) most of which also do not vary over time.
So far, I have understood that fixed-effect models get rid of individual characteristics that do not vary over time. So in theory I understand that this should also remove the categorical variable I'm after, as well as gender etc. However my R implementation of the model leaves it and runs the model without a warning. I find this quite confusing.
testdf <- pdata.frame(x = data, index = c("Matricule", "Année"))
reg_fe3 <- plm(form_fe2,data = testdf, model = "within",
na.action = na.exclude)
Unbalanced Panel: n = 2900, T = 1-6, N = 10704
Residuals:
Min. 1st Qu. Median 3rd Qu. Max.
-5719.553 -87.872 0.000 93.334 9503.712
Coefficients:
Estimate Std. Error t-value Pr(>|t|)
origageSTRUCTURés 23.87762 54.99487 0.4342 0.6642
origageTransféré 423.91081 65.24857 6.4969 8.706e-11 ***
femme -169.78465 26.57132 -6.3898 1.757e-10 ***
heures_mens 8.91896 0.27759 32.1300 < 2.2e-16 ***
age25-45 224.78853 34.50944 6.5138 7.781e-11 ***
age45+ 427.83609 38.49898 11.1129 < 2.2e-16 ***
CatégorieB -985.88232 30.49179 -32.3327 < 2.2e-16 ***
CatégorieC -1540.88488 28.89043 -53.3355 < 2.2e-16 ***
StatutSTAGIAIRE_EMPLOIS_AIDES 139.44980 29.80018 4.6795 2.924e-06 ***
StatutTITULAIRE 262.08616 27.51303 9.5259 < 2.2e-16 ***
Année2015 27.68404 17.88236 1.5481 0.1216
Année2016 73.50181 17.87381 4.1123 3.958e-05 ***
Année2017 163.60559 17.98322 9.0977 < 2.2e-16 ***
Année2018 252.94818 18.19012 13.9058 < 2.2e-16 ***
Année2019 284.27151 18.50541 15.3615 < 2.2e-16 ***
Total Sum of Squares: 2371500000
Residual Sum of Squares: 1407300000
R-Squared: 0.40658
Adj. R-Squared: 0.18457
F-statistic: 355.778 on 15 and 7789 DF, p-value: < 2.22e-16
My problem is that here the variable of interest origageTransféré gets a high and significant coefficient estimated, whereas when I run a pooled OLS mode with the same control variables, I get a non-significant estimate. The Breusch-Pagan Lagrange multiplier Test on the pooled ols regression rejected the null (very significantly).
All in all I'm quite confused as what is appropriate in this case.