I suspected there was a high degree of multicollinearity in the independent variables of my data. Each of these variables is ordinal. The original model is
library(logistf)
EC_all <- logistf(Erad_contr ~ Entry_risk + Entry_conf + Establishment_risk+ Establishment_conf + Spread_risk + Spread_conf+ Impacts_Risk + Impacts_Conf, data = Published, family = "binomial")
I then attempted to get VIF scores using the following:
library(car)
EC_test <- lm(Erad_contr ~ Entry_risk + Entry_conf + Establishment_risk+ Establishment_conf + Spread_risk + Spread_conf+ Impacts_Risk + Impacts_Conf, data = Published)
vif(EC_test)
GVIF Df GVIF^(1/(2*Df))
Entry_risk 7.882987 3 1.410745
Entry_conf 14.858967 3 1.567947
Establishment_risk 8.755895 3 1.435655
Establishment_conf 26.363955 3 1.725183
Spread_risk 7.105005 4 1.277749
Spread_conf 8.517452 3 1.429064
Impacts_Risk 7.951980 4 1.295864
Impacts_Conf 9.266215 3 1.449274
Should I be looking at GVIF which seems very high, or GVIF^(1/(2*Df)) which seems more normal. Regardless, have I done this correctly? I did not create dummy variables to do this, and have read that you should do this for categorical data, but I have not found much information on using ordinal data. If this is incorrect, how should I calculate VIF scores, or is there a better alternative?
UPDATE
Please note this is for a slightly different model shown below. But the point is the same. The original model is:
EC_Conc <- glm(Erad_contr ~ Conc_Risk+Conc_Conf, data = Published, family = "binomial")
I have attempted to create dummy variables as such:
For_Vif <- fastDummies::dummy_cols(For_Vif,select_columns = c("Conc_Risk", "Conc_Conf") )
and then created a model using each of the dummy variables as my independent variables and attempted to get VIF values:
VifModel3 <- lm(Erad_contr ~ Conc_Risk_Vlow+Conc_Risk_Low+
Conc_Risk_Med+Conc_Risk_High+Conc_Risk_Vhigh+ +Conc_Conf_Low+Conc_Conf_Med+Conc_Conf_High+Conc_Conf_Vhigh, data = For_Vif)
vif(VifModel3)
This yields the error
Error in vif.default(VifModel3) :
there are aliased coefficients in the model
Is this closer to correct rather than what was done before? How can I fix this error message and get my VIF scores?
UPDATE 2
As suggested by @Randcelot, I removed the lowest category for each of the variables in the lm.
VifModel3 <- lm(Erad_contr ~ Conc_Risk_Low+Conc_Risk_Med+Conc_Risk_High
+Conc_Risk_Vhigh+Conc_Conf_Med+Conc_Conf_High+Conc_Conf_Vhigh, data = For_Vif)
vif(VifModel3)
Conc_Risk_Low Conc_Risk_Med Conc_Risk_High
12.951637 21.451194 20.794598
Conc_Risk_Vhigh Conc_Conf_Med Conc_Conf_High
1.976190 4.152511 4.469138
Conc_Conf_Very_high
1.532027
There are multiple VIF scores for each variable. Conc_conf looks acceptable for each. Whereas for Conc_risk vhigh looks acceptable while the others do not. Is it safe to assume that since some of the scores are very high, there is multicollinearity here? Seeing as there are only two variables here, I guess I can remove either of the independent variables?