5

When you run a regression on ice cream sales with predictor shark attacks, you find a significant coefficient. But that is because there is a confounding variable temperature.

But how do you correct for this confounder? In case you also add temperature to the model, you have two correlated predictors and the multicollinearity problem. So in case you found the confounder temperature, is the only option to leave the shark attacks out of the model?

Kasper
  • 3,059
  • 2
  • 22
  • 37

2 Answers2

9

First, just because two variables are fairly highly correlated does not mean they are colinear to a problematic degree. Problematic collinearity is best examined with condition indices. See the work of David Belsley or see my dissertation Collinearity Diagnostics in Multiple Regression: A Monte Carlo Study.

Second, if you did find high collinearity, you could solve it with methods such as ridge regression, which are biased but have better variance when there is collinearity.

Third, the effects of collinearity are seen in the variances of the parameter estimates, not in the parameter estimates themselves. So, when you see that the parameter estimate for shark attacks is greatly reduced, that is a sign of confounding.

Peter Flom
  • 94,055
  • 35
  • 143
  • 276
  • 1
    I am still confused about the concepts. On one hand, adding a confounding variable improves your model. A textbook example is adding a gender dummie. On the other hand, wikipedia says a confounder is typically correlated with both the independent and the dependent variable, and thus increasing the confidence intervals of the coefficient estimates (multicollinearity). – Kasper Jan 12 '14 at 12:54
  • 1
    Adding a confounding variable ought to make it clear that the original independent variable is not needed. e.g. that you ought to model ice cream sales by temperature, not shark attack. – Peter Flom Jan 12 '14 at 12:57
  • 1
    Ok and in the case of adding the gender dummie, that isn't really a confounding variable, just another significant predictor? – Kasper Jan 12 '14 at 12:57
  • 1
    Indeed, in most cases anyway. – Peter Flom Jan 12 '14 at 12:58
  • Ok. And why is it than exactly that confounding variable and multicollinearity al rarely mentioned in the same chapter (i.e. on the wikipedia page of confounding, multicollinearity is not mentioned). – Kasper Jan 12 '14 at 13:01
  • I can't say why people don't mention something; you'd have to ask the authors :-). Perhaps because the reason for confounding is clear in this case, as is the remedy? – Peter Flom Jan 12 '14 at 13:03
1

If you cannot use a linear regression with a controlling variable, or related, the partial correlation, another option is to stratify on the temperature variable.

To do so, categorize temperature into multiple, say five, categories. Then you run five conditional regression models, i.e. for the units in each category of temperature one model of shark attacks and sales. Use the quintiles of temperature as thresholds for categorization.

The net effect of shark attacks is found by taking the weighted average of the regression parameter across the categories of temperature (i.e. weighted for the category frequency of temperature).

I believe there is an argument by Cochran that says that stratifications of this type can remove 90% of the bias due to confounding.

tomka
  • 5,874
  • 3
  • 30
  • 71