0

I'm analyzing data in R, I'm trying to see how some variables affect test scores (Value) of different countries. In the data, since there is different time periods for different countries I need to use a panel technique to regress this data. I'm using a fixed-effect model, more specifically a constant-free model, where you include dummy variables for every particular country. In the equation I took off the intercept to eliminate collinearity. When I ran the regression, I could see the statistical information for all variables except one. I get "NA" for this variable in R. I think my model still suffers from collinearity and I think the solution to fix this problem is just to drop one dummy variable from the equation. I did do that and I'm no longer getting "(1 not defined because of singularities)". I just wanted to double check with someone else if I did the right thing? This is my code:

library(stats)
library(lmtest)
library(dplyr)
library(plm)

# running fixed effects with country dummies

Data_we_using2$LOCATION = as.factor(Data_we_using2$LOCATION)

dummyregress = lm(Value ~ Boysdummy + RGDPCapita+Graduation_rates + MissingGraDummy + Mortality_rate + GovtExpen + MissingGovt + Private + LOCATION-1, data = Data_we_using2)

summary(dummyregress)

LOCATION is the dummy variables of the countries. The pictures I included is the output I got for R. enter image description hereenter image description hereenter image description here

Thomas Bilach
  • 4,732
  • 2
  • 6
  • 25
  • 1
    If you put a dummy variable in for every country, the dummy variables cause problems they are mutually dependent (each one is $1$ minus the sum of the others). – Henry Apr 04 '21 at 00:28
  • 1
    Technically, if you include `LOCATION - 1`, then all country-specific coefficients should appear in your summary output. It appears there is collinearity, but it's hard to diagnose without seeing your data. I would look at the model matrix and assess for any redundancies. – Thomas Bilach Apr 04 '21 at 01:05
  • I used command "alias" in R and I think it tells you if there collinearity in the variables. I think in the Private dummy there is collinearity but dropping one country dummy variable seems to fix the problem. – Omar Perez Rodriguez Apr 04 '21 at 01:48
  • @OmarPerezRodriguez This is a quick trick, but try specifying the country dummies first in terms of variable ordering. Does R drop `Private` or possibly any other variables? – Thomas Bilach Apr 04 '21 at 03:41
  • @ThomasBilach I put the country dummy variables first in the equation and then I ran the regression. This time, the Private variable has "NA" on its statistical information. What does that mean? – Omar Perez Rodriguez Apr 04 '21 at 23:25
  • One country effect is completely collinear with the `Private` dummy variable. In general, R uses variable ordering to break collinearity. If the country effects are specified first, then R will estimate all country-specific intercepts, but now it must drop `Private` as it is redundant with the dummy variable for the United States. I can't comment on why this is happening, but including a full set of country effects completely absorbs `Private`. I would start by investigating how `Private` is coded within each country. – Thomas Bilach Apr 05 '21 at 18:12
  • @ThomasBilach We know now that the Private dummy is collinear with another dummy variable but how do I fix that problem? – Omar Perez Rodriguez Apr 05 '21 at 23:22
  • In practice, simply drop the variable. Note, R self-corrects by removing any redundant regressors. You may not have a problem at all. Is it safe to say that `Private` doesn't vary over time, particularly within the United States? – Thomas Bilach Apr 05 '21 at 23:38
  • @ThomasBilach I have the Private dummy to represent the presence of private schools in different countries. The Private dummy shows the effect of private schools in test scores. I can not drop the Private dummy because it is a very important variable in my analysis. At least in the data I have, countries' test scores for different years, the Private dummy does not vary. Would it a good idea to just drop the USA dummy variable? – Omar Perez Rodriguez Apr 06 '21 at 00:01
  • So the `Private` dummy is either all 0’s or all 1’s within a country, correct? – Thomas Bilach Apr 06 '21 at 00:05
  • Yes the Private dummy is all 0s and 1s. – Omar Perez Rodriguez Apr 06 '21 at 00:10
  • And is it like that *within* countries? In other words, the dummy doesn’t change over the time periods in any one country? – Thomas Bilach Apr 06 '21 at 00:18
  • The Private dummy does not change over the time periods – Omar Perez Rodriguez Apr 06 '21 at 00:55
  • Then the presence or absence of private schools is a fixed characteristic of each jurisdiction. A country either has private schools/universities or they don't. The country fixed effects already account for the time-constant attributes specific to your countries. This is why `Private` was dropped from your model. Does that make sense? – Thomas Bilach Apr 06 '21 at 02:38
  • Your saying fixed-effect models account for time-constant attributes? – Omar Perez Rodriguez Apr 07 '21 at 02:12
  • Precisely. The country fixed effects account for all *time-invariant* attributes specific to your countries. – Thomas Bilach Apr 07 '21 at 06:53
  • @ThomasBilach ok, thank you. I'll just drop the Private dummy since fixed-effects is already adjusting for it. – Omar Perez Rodriguez Apr 08 '21 at 00:18

0 Answers0