1

I'm currently working on a logistic regression model and I wanted to pinpoint the VIF for every predictor inside the model. I've found that packages 'car' and 'HH' have the same vif function, so I proceeded computing it using one of the two functions above.

The glm call follows:

logitA1 <- glm(formula = Successful ~ CodeSnippet + I(Weekday=='Weekend') + 
           I(GMTHour=='Afternoon') + I(GMTHour=='Evening') + 
           I(GMTHour=='Night') + I(BodyLength=='Medium') + 
           I(BodyLength =='Long') + I(TitleLength=='Medium') + 
           I(TitleLength=='Long')+ SentimentPositiveScore + 
           SentimentNegativeScore + NTag + AvgUpperCharsPPost + URL + 
           IsTheSameTopicBTitle + I(UserReputation=='Low') + 
           I(UserReputation=='Established') + I(UserReputation=='Trusted'), 
           data=dsA1, family=binomial())

As far as I know, VIF is computed on every single predictor, and not on every single value of every single predictor, but the results of the vif call are:

CodeSnippet = 1.013995, I(Weekday == "Weekend") = 1.005570, 
I(GMTHour == "Afternoon") = 1.439526, I(GMTHour == "Evening") = 1.374073, 
I(GMTHour == "Night") = 1.322700, I(BodyLength == "Medium") = 1.082132,
I(BodyLength == "Long") = 1.072741, I(TitleLength == "Medium") = 3.534710, 
I(TitleLength == "Long") = 3.547255, SentimentPositiveScore = 1.056977,
SentimentNegativeScore = 1.036823, NTag = 1.015646, AvgUpperCharsPPost = 1.011055, 
URL = 1.014852, I(UserReputation == "Low") = 1.450856, 
I(UserReputation == "Established") = 1.447942, I(UserReputation == "Trusted") = 1.021978

As you can see, the VIF is calculated on every single value of the predictor for categorical variables on which I've used the wrapping function I(), and VIF values are the same on the values of a single predictor, so I assume real VIF is not far from those values for every single predictor. I would like to know, though, why I got those results? And how can I be sure of the real VIF value for a predictor with multiple values, like BodyLength or TitleLength or UserReputation?

Anacarnil
  • 83
  • 1
  • 4
  • I think it is about the way vif is implemented in these packages.Try to estimate the same model without using `I()` for factors, and then try `vif` function in the `car` package. This might solve your problem. But that could make the question off-topic here. – T.E.G. Jan 11 '17 at 18:54
  • I tried doing as you said, the results using vif in car package seem reasonable, and there's no value above 1.10 anymore. Thank you very much for the reply! – Anacarnil Jan 12 '17 at 11:25
  • 1
    You are welcome. Also you might find interesting the answer of John Fox (author of `car` package) to this question about GVIF (reported by `vif` function): http://stats.stackexchange.com/questions/70679/which-variance-inflation-factor-should-i-be-using-textgvif-or-textgvif . Other answers are also interesting (esp. Jan Philipp S') and somehow related to your question. – T.E.G. Jan 12 '17 at 20:15

0 Answers0