1

enter image description here

Date, age, mrt and shops are all predictors in a dataset of 414 observations. Pearson's product-moment correlation shows a sizeable negative correlation between mrt and shops (-0.6 so definitely higher than the minimum benchmark of 2/sqrt(n)). Yet the VIF for both is quite low. Does this mean there is multicollinearity or not? And why is the VIF so low if Pearson's r is -0.6?

Ps: I have found a similar question here, but there Pearson's r is not negative and that might mean a difference. Any help would be much appreciated.

Reader 123
  • 467
  • 2
  • 6

1 Answers1

2

This is largely covered elsewhere, e.g., in my answer to When can we speak of collinearity. Whether Pearson's $r$ is positive or negative makes no difference.

I have never heard of your "minimum benchmark", and it doesn't make any sense to me. Consider that if you only had $4$ data, I gather your minimum benchmark would say that a pairwise correlation between variables equal to $r = 1.0$ would be fine (i.e., $2/\sqrt{4} = 1$), whereas if you had $1600$ data, any $r>.05$ would be problematic (i.e., $2/\sqrt{1600} = .05$). I may be misunderstanding it, but that's nonsensical. Consider that, unless you have perfect multicollinearity, the primary impact is a reduction of power but that can still be overcome with sufficient $N$ (cf., my answer to: What is the effect of having correlated predictors in a multiple regression model?).

By (arbitrary) rule of thumb, you have a 'problem with multicollinearity' when you have a ${\rm VIF} \ge 10$. With respect to pairwise correlations alone, that would imply $|r| \gtrapprox .95$.

Dave
  • 28,473
  • 4
  • 52
  • 104
gung - Reinstate Monica
  • 132,789
  • 81
  • 357
  • 650
  • Thank you very much for giving me links to other answers and still taking the trouble to answer my question - much appreciated! As regards the "minimum benchmark" it is from Newbold-Carson-Thorne: Statistics for Business and Economics (ISBN 9780273767060), eighth edition, page 84, bottom of the page. "A useful rule to remember is that a relationship exists if |r| >= 2/sqrt(n)" – Reader 123 Sep 02 '21 at 17:32
  • 1
    @Reader123, Oh, I see. They're giving a rule of thumb for eyeballing the statistical significance of a correlation. That doesn't matter. With respect to potential multicollinearity, it doesn't matter whether the correlation is 'real' in the population or not. Multicollinearity is about the correlations *in your sample* being 'too high'. – gung - Reinstate Monica Sep 02 '21 at 20:32
  • Thank you, @gung, I completely missed the aspect of one being about the population and the other about the sample only... :-| Thanks for your help. – Reader 123 Sep 03 '21 at 09:34
  • You're welcome, @Reader123. – gung - Reinstate Monica Sep 03 '21 at 13:21