1

I know there are many similar questions like this one, which I have consulted (e.g Panel Data: Pooled OLS vs. RE vs. FE Effects ; Pooled OLS vs RE and FE ; Difference between OLS and FE) . Yet I haven't been able to answer my question from this.

I have a panel data set of administrative agents pay. I have six years of average monthly pay (thus one observation per year per agent). The panel is unbalanced: some agents are here for 6 years, some only 2 etc.

I am trying to understand the difference of pay between agents based on a categorial variable that does not vary over time. I have a couple of control variables (gender, age-categories, admin category etc.) most of which also do not vary over time.

So far, I have understood that fixed-effect models get rid of individual characteristics that do not vary over time. So in theory I understand that this should also remove the categorical variable I'm after, as well as gender etc. However my R implementation of the model leaves it and runs the model without a warning. I find this quite confusing.

testdf <- pdata.frame(x = data, index = c("Matricule", "Année"))

reg_fe3 <- plm(form_fe2,data = testdf, model = "within",
               na.action = na.exclude)

Unbalanced Panel: n = 2900, T = 1-6, N = 10704

Residuals:
     Min.   1st Qu.    Median   3rd Qu.      Max. 
-5719.553   -87.872     0.000    93.334  9503.712 

Coefficients:
                                 Estimate  Std. Error  t-value  Pr(>|t|)   
origageSTRUCTURés                23.87762    54.99487   0.4342    0.6642    
origageTransféré                423.91081    65.24857   6.4969 8.706e-11 ***
femme                          -169.78465    26.57132  -6.3898 1.757e-10 ***
heures_mens                       8.91896     0.27759  32.1300 < 2.2e-16 ***
age25-45                        224.78853    34.50944   6.5138 7.781e-11 ***
age45+                          427.83609    38.49898  11.1129 < 2.2e-16 ***
CatégorieB                     -985.88232    30.49179 -32.3327 < 2.2e-16 ***
CatégorieC                    -1540.88488    28.89043 -53.3355 < 2.2e-16 ***
StatutSTAGIAIRE_EMPLOIS_AIDES   139.44980    29.80018   4.6795 2.924e-06 ***
StatutTITULAIRE                 262.08616    27.51303   9.5259 < 2.2e-16 ***
Année2015                        27.68404    17.88236   1.5481    0.1216    
Année2016                        73.50181    17.87381   4.1123 3.958e-05 ***
Année2017                       163.60559    17.98322   9.0977 < 2.2e-16 ***
Année2018                       252.94818    18.19012  13.9058 < 2.2e-16 ***
Année2019                       284.27151    18.50541  15.3615 < 2.2e-16 ***
Total Sum of Squares:    2371500000
Residual Sum of Squares: 1407300000
R-Squared:      0.40658
Adj. R-Squared: 0.18457
F-statistic: 355.778 on 15 and 7789 DF, p-value: < 2.22e-16

My problem is that here the variable of interest origageTransféré gets a high and significant coefficient estimated, whereas when I run a pooled OLS mode with the same control variables, I get a non-significant estimate. The Breusch-Pagan Lagrange multiplier Test on the pooled ols regression rejected the null (very significantly).

All in all I'm quite confused as what is appropriate in this case.

Deter11
  • 13
  • 4
  • 1
    Check your data for those variable that do not vary over time per invidiual. If they do not vary for all individuals, they should drop out. Gender is a candidate but not necessarily. It is not suprising you get significant results in one model and insignificant results in the other - they are different models after all. Also check your code: you create data called `test` but you use `testdf` in the estimation. If `testdf` is not a pdata.frame yet, the first two variables are taken as index variables (which can lead to suprising results if these are not the individual and time dimension). – Helix123 Nov 25 '20 at 20:57
  • 1
    Typically, you would want to perform an F test of the pooling model vs. the FE model to check if adding FEs is worthwile, use `pFtest` from package `plm` for that. – Helix123 Nov 25 '20 at 21:01
  • Thank you for the answer. I have edited the bit of code in the question because ```test``` and ```testdf``` are just names used for the question but the function calls the correct df in the code. You're right, it did drop them when I redid properly. Then this does mean that FE models cannot be used if your variable of interest is time_invariant? (As you won't be able to get a coefficient for it as it gets dropped). – Deter11 Nov 26 '20 at 00:07
  • Yes. See any textbook about panel data models. – Helix123 Nov 26 '20 at 16:53

0 Answers0