Interpreting Pooled OLS and FE different results

Question

I know there are many similar questions like this one, which I have consulted (e.g Panel Data: Pooled OLS vs. RE vs. FE Effects ; Pooled OLS vs RE and FE ; Difference between OLS and FE) . Yet I haven't been able to answer my question from this.

I have a panel data set of administrative agents pay. I have six years of average monthly pay (thus one observation per year per agent). The panel is unbalanced: some agents are here for 6 years, some only 2 etc.

I am trying to understand the difference of pay between agents based on a categorial variable that does not vary over time. I have a couple of control variables (gender, age-categories, admin category etc.) most of which also do not vary over time.

So far, I have understood that fixed-effect models get rid of individual characteristics that do not vary over time. So in theory I understand that this should also remove the categorical variable I'm after, as well as gender etc. However my R implementation of the model leaves it and runs the model without a warning. I find this quite confusing.

testdf <- pdata.frame(x = data, index = c("Matricule", "Année"))

reg_fe3 <- plm(form_fe2,data = testdf, model = "within",
               na.action = na.exclude)

Unbalanced Panel: n = 2900, T = 1-6, N = 10704

Residuals:
     Min.   1st Qu.    Median   3rd Qu.      Max. 
-5719.553   -87.872     0.000    93.334  9503.712 

Coefficients:
                                 Estimate  Std. Error  t-value  Pr(>|t|)   
origageSTRUCTURés                23.87762    54.99487   0.4342    0.6642    
origageTransféré                423.91081    65.24857   6.4969 8.706e-11 ***
femme                          -169.78465    26.57132  -6.3898 1.757e-10 ***
heures_mens                       8.91896     0.27759  32.1300 < 2.2e-16 ***
age25-45                        224.78853    34.50944   6.5138 7.781e-11 ***
age45+                          427.83609    38.49898  11.1129 < 2.2e-16 ***
CatégorieB                     -985.88232    30.49179 -32.3327 < 2.2e-16 ***
CatégorieC                    -1540.88488    28.89043 -53.3355 < 2.2e-16 ***
StatutSTAGIAIRE_EMPLOIS_AIDES   139.44980    29.80018   4.6795 2.924e-06 ***
StatutTITULAIRE                 262.08616    27.51303   9.5259 < 2.2e-16 ***
Année2015                        27.68404    17.88236   1.5481    0.1216    
Année2016                        73.50181    17.87381   4.1123 3.958e-05 ***
Année2017                       163.60559    17.98322   9.0977 < 2.2e-16 ***
Année2018                       252.94818    18.19012  13.9058 < 2.2e-16 ***
Année2019                       284.27151    18.50541  15.3615 < 2.2e-16 ***
Total Sum of Squares:    2371500000
Residual Sum of Squares: 1407300000
R-Squared:      0.40658
Adj. R-Squared: 0.18457
F-statistic: 355.778 on 15 and 7789 DF, p-value: < 2.22e-16

My problem is that here the variable of interest origageTransféré gets a high and significant coefficient estimated, whereas when I run a pooled OLS mode with the same control variables, I get a non-significant estimate. The Breusch-Pagan Lagrange multiplier Test on the pooled ols regression rejected the null (very significantly).

All in all I'm quite confused as what is appropriate in this case.

Check your data for those variable that do not vary over time per invidiual. If they do not vary for all individuals, they should drop out. Gender is a candidate but not necessarily. It is not suprising you get significant results in one model and insignificant results in the other - they are different models after all. Also check your code: you create data called `test` but you use `testdf` in the estimation. If `testdf` is not a pdata.frame yet, the first two variables are taken as index variables (which can lead to suprising results if these are not the individual and time dimension). — Helix123, Nov 25 '20 at 20:57
Typically, you would want to perform an F test of the pooling model vs. the FE model to check if adding FEs is worthwile, use `pFtest` from package `plm` for that. — Helix123, Nov 25 '20 at 21:01
Thank you for the answer. I have edited the bit of code in the question because ```test``` and ```testdf``` are just names used for the question but the function calls the correct df in the code. You're right, it did drop them when I redid properly. Then this does mean that FE models cannot be used if your variable of interest is time_invariant? (As you won't be able to get a coefficient for it as it gets dropped). — Deter11, Nov 26 '20 at 00:07

Interpreting Pooled OLS and FE different results

0 Answers0