0

I'm using the mtcars dataset in R, I used the car packages to estimate the VIF, but since I have factor variables I got the vif table with GVIF and GVIF1/(2⋅df) values, in another question Which variance inflation factor should I be using: $\text{GVIF}$ or $\text{GVIF}^{1/(2\cdot\text{df})}$?, John Fox, co-author of https://www.tandfonline.com/doi/abs/10.1080/01621459.1992.10475190#.U2jkTFdMzTo, mentioned that they recommend using GVIF^(1/(2*Df)), but I don't know if I should use the rule of thumb of <5 with the standard VIF or I should use another number.

This is my code:

mtcars2 <- within(mtcars, {
  vs <- factor(vs, labels = c("V", "S"))
  am <- factor(am, labels = c("Automatic", "Manual"))
  cyl  <- ordered(cyl)
  gear <- ordered(gear)
  carb <- ordered(carb)
})

mtcars2$loghp <- log(mtcars2$hp) 


mtcars2 <- mtcars2 %>%  
  dplyr::mutate(cylnum = as.numeric(mtcars2$cyl))%>%  
  dplyr::mutate(cylcat = cut(cylnum, breaks = c(0, 1, 2, 3),
                             labels = c("Cyl_4", "Cyl_6", "Cyl_8")))

mtcars2_lm <- mtcars2[, c(1,2,3,5,6,7,8,9,10,11,12)]
model1 <- lm(mpg ~., data = mtcars2_lm)
vif(model1)
GVIF Df GVIF^(1/(2*Df))
cyl 98.027045 2 3.146563
disp 57.217057 1 7.564196
drat 7.105793 1 2.665669
wt 23.490085 1 4.846657
qsec 10.731794 1 3.275942
vs 7.354487 1 2.711916
am 9.936800 1 3.152269
gear 50.681013 2 2.668157
carb 244.371502 5 1.733026
loghp 14.626620 1 3.824476
Begdev
  • 1
  • Welcome to Cross Validated! Threshold for what? – Dave Jan 10 '22 at 18:01
  • For deciding which variables I should delete from my model, I have read that the rule of thumb for continuous variables is 5, so you keep all variables that have a <5 value in its vif score. – Begdev Jan 10 '22 at 19:40
  • 1
    Why do you want to delete variables from your model? – Dave Jan 10 '22 at 19:44
  • To prevent the multicollinearity problem – Begdev Jan 10 '22 at 19:53
  • Please read over [this thread](https://stats.stackexchange.com/q/168622/28500) for reasons why you might not need to worry about VIF much at all. High multicollinearity might lead to high-magnitude covariances among coefficient estimates, but for a predictive model that's not really a problem. – EdM Jan 10 '22 at 19:53
  • 1
    Also keep in mind that, yes, you might decrease your variance by getting stable coefficient estimates, but that could be at the expense of biasing your model to discount a predictor that does matter, perhaps so much bias that the decrease in variance is not worth the increase in bias. – Dave Jan 10 '22 at 20:06

0 Answers0