2

When we build regression models, we need to check the correlations between attributes. Could anyone explain the difference between checking the correlation between pairwise attributes and multicolinearity. What I found was that some attributes have high pairwise correlation, say greater than 0.6. But when I am checking their VIF, which is less than 3, showing they are not correlated.

What's the correct way of checking correlation and decide the attributes to be dropped?

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
Nanan
  • 65
  • 3
  • VIF is not a measure of correlation. It measures multi-colinearity. How correlation, colinearity & multicollinearity are different from each other & how to deal with those? My answer at https://stats.stackexchange.com/a/432543/79100 [and mentioned links of research gate posts] will give you a detailed picture of these concpets – Dr Nisha Arora Jan 24 '21 at 06:13

1 Answers1

2
  1. It's possible to have multicollinearity without any individual correlation being high.

    e.g. here's a correlation matrix of 10 variates, the largest one of which (in absolute value) is below 0.3:

            [,1]   [,2]   [,3]   [,4]   [,5]   [,6]   [,7]   [,8]   [,9]  [,10]
     [1,]    -   -0.264 -0.102 -0.140 -0.138 -0.115  0.156 -0.262 -0.085 -0.001
     [2,] -0.264    -   -0.040 -0.028 -0.216 -0.005 -0.246 -0.075 -0.004 -0.231
     [3,] -0.102 -0.040    -   -0.133 -0.082 -0.292 -0.144 -0.079  0.028 -0.206
     [4,] -0.140 -0.028 -0.133    -   -0.128  0.022 -0.249 -0.204 -0.139 -0.078
     [5,] -0.138 -0.216 -0.082 -0.128    -   -0.144 -0.049 -0.080 -0.116 -0.202
     [6,] -0.115 -0.005 -0.292  0.022 -0.144    -   -0.123  0.032 -0.131 -0.077
     [7,]  0.156 -0.246 -0.144 -0.249 -0.049 -0.123    -   -0.188 -0.222  0.071
     [8,] -0.262 -0.075 -0.079 -0.204 -0.080  0.032 -0.188    -    0.050 -0.052
     [9,] -0.085 -0.004  0.028 -0.139 -0.116 -0.131 -0.222  0.050    -   -0.236
    [10,] -0.001 -0.231 -0.206 -0.078 -0.202 -0.077  0.071 -0.052 -0.236    -  
    

    Yet the set of variables is perfectly collinear.

  2. A VIF of 3 doesn't say "not correlated". It says the effect of the amount of multicollinearity you have on variance of estimates is not really large.

  3. Consequently I'd pay more attention to the VIF than the individual correlations, but dropping variables isn't the only option when you have multicollinearity.

Glen_b
  • 257,508
  • 32
  • 553
  • 939
  • Thanks for your detailed explanation. So my question now is what is the difference between paireise correlation and multicolinearity. The paireise correlation only checked if two attributes at checking are moving to the same direction, while multicolinearity checks if there is any linear relationship between the attribute at checking with the rest of attributs.is it a correct understanding? – Nanan Jul 11 '17 at 12:18
  • That's pretty much it; the correlation looks only at pairs of variables, while multicollinearity is a property of the entire collection of variables -- if some linear combination of all of them (including the constant term) is 0 (or equivalently as you put it, if one of them can be written as a linear combination of the others) then you have perfect multicollinearity. If that nearly happens (instead of exactly happens) then the VIF will be high even though no correlation is large. – Glen_b Jul 11 '17 at 12:53