I have unbalanced panel data and want to fit a regression model with the degree of internationalization of a firm as dependent variable, measured in the ratio to foreign sales to total sales (fsts). One of my control dummy variables (the respective industry, gind) is time invariant. Due to the time invariant nature of the industry dummy variable, it is dropped automatically from my fixed effect model:
fixed<- plm(fsts~firm_size+rota+debt_to_assets+r_d_intensity+factor(gind), data=firm_ceo, index=c("gvkey", "fyear"), model="within")
summary(fixed)
Oneway (individual) effect Within Model
Call:
plm(formula = fsts ~ firm_size + rota + debt_to_assets + r_d_intensity +
factor(gind), data = firm_ceo, model = "within", index = c("gvkey",
"fyear"))
Unbalanced Panel: n = 944, T = 1-7, N = 4433
Residuals:
Min. 1st Qu. Median 3rd Qu. Max.
-5.2679e-01 -1.1023e-02 -6.1760e-06 1.0698e-02 6.7683e-01
Coefficients:
Estimate Std. Error t-value Pr(>|t|)
firm_size 0.00031128 0.00339678 0.0916 0.926989
rota -0.05715082 0.01738153 -3.2880 0.001019 **
debt_to_assets -0.00145714 0.00930919 -0.1565 0.875627
r_d_intensity 0.00726665 0.03168240 0.2294 0.818603
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Total Sum of Squares: 11.145
Residual Sum of Squares: 11.093
R-Squared: 0.0046326
Adj. R-Squared: -0.26584
F-statistic: 4.05495 on 4 and 3485 DF, p-value: 0.0027785
However, I conducted a Hausmann test as suggested here Panel data model estimation with dummy variables and found that using a fixed effect regression model is better than a random effect model (the compared models did not include the industry dummy).
Now, as said I cannot use a simple fixed effects model because the industry is very important to my research.
If I were to use the random model instead would the interpretation of model fit and coefficients even vary significantly? Or how could I use a fixed effect model and keep the time invariant variables specifically in R? The answers in this post How to keep time invariant variables in a fixed effects model unfortunately seem very specific to the interpretation of the meaning of gender and do not help with the implementation in R.
[EDIT]
The answer seems to be to use a correlated random effects model which combines fixed and random effects. The model is also known as within-between model, Mundlak procedure or hybrid approach. For anyone having the same problem I recommend this paper: https://www.researchgate.net/publication/336608555_On_Ignoring_the_Random_Effects_Assumption_in_Multilevel_Models_Review_Critique_and_Recommendations which comes with a short and intuitive explanatory video: https://www.youtube.com/watch?v=mnMB8MnBlqI
Unfortunately, I am having trouble coding this in R so anyone who has seen this before and can help please check out my question on stackoverflow on how to implement CRE for unbalanced panels in R: https://stackoverflow.com/questions/68040949/how-to-calculate-the-cluster-means-of-variables-in-a-correlated-random-effects-m