3

Can I ignore multicolinearity problem if all the regression coefficients are highly significant?

My data is large enough (i.e. I have several regression models where each of the data points for them ranges from 2958 to 11646 data points for every each 6 independent variables. so it is 6 times of these 2958 - 11646 data points for each independent variable to count the total number of data points) and all the resulting coefficients are significant enough in less than 0.01 level. The only thing I see is that one of the variable has the correlation of 0.9 (i.e. the correlation value of one variable to another one is 0.9 but I do not want to remove either of them.).

I am trying to see on unit increase effect of this variable while keeping all other variables constant. Can I keep this variable?

Besides, if I delete one of the variable with high VIF which is between 13 anad 14, all the other VIF are safe but the intercept becomes insignificant for all cases

I am also referring to the following website comment: http://www.researchconsultation.com/multicollinearity-multiple-regression.asp

So, in sum, my ultimate goal is to use the final output from the logistic regression model generated from the independent variables and one binary dependent variable. If so, do you think I can ignore the multicolinearity problem?

Eric
  • 434
  • 1
  • 10
  • 27
  • "Has the correlation of 0.9"? Can you please elaborate on the meaning of this phrase? How did you conclude this? How big is your sample and what is the number of parameters in your model? – usεr11852 Jun 22 '16 at 18:00
  • Of course. Please check now. – Eric Jun 22 '16 at 18:05
  • The numerical output is the one I need and want to show how this final numerical output from my different model other than my logistic regression model which is a function of y gets different with different emphasis on indepdent variable increase up to two units. – Eric Jun 24 '16 at 17:31

3 Answers3

10

What is your ultimate goal? Are you going to use the model to make predictions on the mean response? If so, correlation is not a problem. However, if you want to make inference then you have to think about it.

I woulg suggest to look up VIF (variable inflation factor) and see whether there is really multicollinearity.

p.s. I could not comment since I've just signed up and have to reach minimum number of reputations.

Zamir Akimbekov
  • 401
  • 1
  • 3
  • 12
  • Comments are not for extended discussion; this conversation has been [moved to chat](http://chat.stackexchange.com/rooms/41571/discussion-on-answer-by-zamir-akimbekov-can-i-ignore-multicolinearity-problem-if). – whuber Jun 23 '16 at 16:45
  • 1
    I would add a point. Multicolinearity is not a problem in a prediction context if only the correlation between your predictors doesn't change! – Metariat Jun 24 '16 at 19:32
7

You can ignore multicollinearity for a host of reasons, but not because the coefficients are significant. In fact, it's one of the issues and manifestations of the multicollinearity issue when you have two or more variables which highly significant when put into the regression together, and not significant at all when added one by one.

For instance, in econometrics you'll get very significant coefficients if you have cross exchange rates, e.g. British pound to dollar, Euro to Dollar and Swiss mark to Dollar, while individually they may not be significant in your model.

Aksakal
  • 55,939
  • 5
  • 90
  • 176
  • Oh sorry. But these are the VIF values that I get from the linear regression model while I am trying to use logistic regression model. Is it safe? – Eric Jun 22 '16 at 19:10
  • the VIF values of the logistic regression is above 10 that I got by imposing VIF function right over the logistic regression in R-programming. However, the VIF values of the linear regression model is reasonable as I've described above. If I am using logistic regression model, should I worry about this? – Eric Jun 22 '16 at 19:15
  • 2
    You're looking for a mechanical decision rule. There are no absolute rules here. You can go with one of the rules of thumb like 10 or 5, but they're all shaky grounds. You have to make a judgement call based on entirety of evidence, where simple rules can be an input into your decision. – Aksakal Jun 22 '16 at 19:48
  • Besides, my logistic regression has all significant coefficients p-values all lower than 0,01 when put into one logistic regression together at the same time. Not separate as you mentioned above. If so, do I still need to worry about multicolinearity if VIF of these two are 13 ~ 14 or so? – Eric Jun 23 '16 at 00:48
  • VIF basically does what I described: calculates the variance inflation in absence of correlation. You really need to understand the tests that you're planning to use. Again, whether you have to worry or not cannot be based on just one test. VIF 13 would be considered too high by many. – Aksakal Jun 23 '16 at 01:09
  • Thank you. Does this still matter even though all the p-values are low enough like less than 0.01? What I've heard is that multicolinearity affects the t-value which is in turn p-value. If p-value is safe enough does this still matter...? – Eric Jun 23 '16 at 08:28
  • Besides, if I delete one of the variable with high VIF, all the other VIF are safe but the intercept becomes insignificant for all cases. – Eric Jun 23 '16 at 12:02
  • I am also referring to the following website comment: http://www.researchconsultation.com/multicollinearity-multiple-regression.asp My main purpose is to use the final output from the logistic regression model. This output is produced by increasing all the independent variables by one unit while I increase two units only on the independent variable I am interested in. I repeat this for all the independent variables and produce several outputs from the dependent variable. If so, do I still need to worry about the multicolinearity? – Eric Jun 23 '16 at 13:16
  • So, in sum, my ultimate goal is to use the final output from the logistic regression model generated from the independent variables and one binary dependent variable. If so, do you think I can ignore the multicolinearity problem? – Eric Jun 23 '16 at 21:16
  • Your question states that you want to make inferences about the coefficients of your variables. In this case multicollinearity is an issue. If you're simply forecasting, then it's not a big deal usually. – Aksakal Jun 23 '16 at 21:23
  • Oh maybe my purpose was not delivered properly. So I have a total of six independent variables and one binary dependent variable. To see what the total output becomes as I increase two units of the independent variable of interest while keeping all other five independent variables to increase only by one. Then I use the final output variable which is between 0 and 1, like a probability scale, into my other own model that I use. If this is so, do you think multicolinearity is a problem? – Eric Jun 23 '16 at 21:27
  • Let us [continue this discussion in chat](http://chat.stackexchange.com/rooms/41583/discussion-between-eric-and-aksakal). – Eric Jun 23 '16 at 21:36
2

Luckily, there is a diagnostic tool called the variance inflation factor (VIF) that allows you to assess whether you have a multicollinearity problem. Usually, VIF scores > 10 are a cause for concern.

Y. Velez
  • 21
  • 1
  • Comments are not for extended discussion; this conversation has been [moved to chat](http://chat.stackexchange.com/rooms/41560/discussion-on-answer-by-y-velez-can-i-ignore-multicolinearity-problem-if-all-th). – whuber Jun 23 '16 at 14:58