Two independent variables both correlate with the dependent variable, but none are significant in a regression analysis

Question

I recently encountered a problem in relation to my study of conflicts and performance. The two types of conflict in the study, task and relationship conflict, both correlated with performance $(-.39, P < .01)$ and $(-.37, P < .01)$. When I put these in a regression, the regression model came out significant, and it explained $22.1\%$ of the variance $(F\ (3, 82) = 7.76,\ P < .01)$. None of the two IVs were, however, significant coefficients! There were no other variables in the regression analysis. How could this be?

It's almost certainly caused by high correlation of your two predictors with each other. You might want to google for the term "variance inflation factor" and "multicollinearity". Put simply, if two or more of the predictors are strongly related to each other the model cannot accurately saying which of the variables is important just that at least one is. Posted as comment instead of answer because I don't have time to explain in detail. — Erik, Aug 23 '13 at 12:42
I agree in spirit with @Erik, but think the tone a little alarmist. There is some correlation surely, but I don't think that _high_ correlation is inevitable with these numbers. (It also depends on what you call high.) Also, running away from multiple regression because of correlations between predictors would rule out almost all multiple regressions.... — Nick Cox, Aug 23 '13 at 12:53
@NickCox Well, it's not as if I said don't use the model :) And just from my gut feeling I thought that a jump from p < .01 to p > 0.05 in the linear model would require at least some substantial correlation, though it probably depends on sample sizes and so on. — Erik, Aug 23 '13 at 12:58
@Erik I am sure that our views are similar here. You've qualified your previous statement, which is precisely what I was suggesting. — Nick Cox, Aug 23 '13 at 13:09

score 8 · Answer 1 · answered Aug 23 '13 at 12:48

From your correlations it is predictable that a regression on task conflict alone would have $R^2$ about $15\%$ and relationship conflict alone about $13\%$. (To see this, just square the correlations.)

So, using both predictors gives a gain of $7\%$ in one case and $9\%$ in the other case. Why not the full $15\%$ or $13\%$? The reason is that task and relationship conflict are correlated with each other, so adding one predictor does not add as much predictive information as you might think.

In essence, the two predictors are fighting each other for a share of the "explanation". This need not be fatal, as the model is a team effort and it is often defensible to include non-significant predictors whenever a model is of (social or behavioural) scientific interest. But you might well

Consider scatter plots of all variables jointly in a scatter plot matrix in your favourite software. (If a scatter plot matrix is not easy in your favourite software, you deserve something better.)
Consider transforming either or both predictors if relationships appear nonlinear.
Consider adding an interaction term.
Discuss the relative merits of the single-predictor models and the two-predictor model.

score -5 · Answer 2 · answered Aug 23 '13 at 12:41

-5

There could be several things going on.

You may have outliers in the two IVs. Which you need to remove using cookD or similar. Perform regression diagnostics.

You have to increase The alpha level. Are the two IVs not significant at 95percent alpha level. Why not change alpha to 90?

Finally as part of diagnostics ensure residuals are normally distributed and there is no hetroskedasticity.

When these steps performed I have seen IVs get significant.

answered Aug 23 '13 at 12:41

user16789

740
1
9
13

5

-1 for increasing the alpha level. I don't think it should always be at 0.05 but doing an analysis, seeing no significance and then just deciding to adhoc increase the alpha level is terrible advice. – Erik Aug 23 '13 at 12:44
Removing outliers just because they are inconvenient and changing alpha level in this way both in my view qualify as unethical as well as poor statistics and poor science. – Nick Cox Aug 23 '13 at 13:08
First all we dont know how many samples we have in the dataset. Assuming he has only few less than say 30. There is nothing unethical in statistics to increase alpha level. Infact we are advised in classroom to do this. You cannot be 95% confident about anything when you have less than 30 samples. Keep up the ethical work guys. – user16789 Aug 23 '13 at 14:01
2

The responses by Erik and @Nick maybe could have been a little more nuanced, but we should take them seriously. Instead of "increasing alpha" one could just report p-values: that's not unethical or terrible. Outliers can be removed provided the removal is documented and inferences are appropriately qualified, but *automatic* removal can lead to deceiving results. I suspect these negative reactions are not to the methods *per se* but to the way in which the otherwise useful advice in this answer seems to have been presented almost as a *recipe for making variables significant.* – whuber Aug 23 '13 at 14:21
I am happy to align myself with @whuber's more diplomatic formulation. The advice given ("need to remove [outliers]" "have to increase ... alpha level") was, however, far too dogmatic and unqualified to escape dissent. – Nick Cox Aug 23 '13 at 14:26
1

@user16789, I don't think what you said is a response to either of the previous comments. Of course you're right that it's not inherently unethical to use an $\alpha$ other than $.05$ but, if you decide beforehand to use $\alpha=.05$, fail to find significance and your solution is to increase the $\alpha$ level (or remove "outliers" or whatever), that's a problem. In the small sample situation you described you're right that your test would likely be underpowered. It's also worth noting that, with many tests, small sample $p$-values are only rough approximations anyhow. – Macro Aug 23 '13 at 14:27
1

@user16789 "You cannot be 95% confident about anything when you have less than 30 samples" Sorry, no; this is at best confusing and at worst confused. Having small sample sizes is exactly when confidence intervals are most needed. Also, you seem to be using "confident" here in a psychological sense, which is quite different from what confidence intervals mean. – Nick Cox Aug 23 '13 at 14:28

Two independent variables both correlate with the dependent variable, but none are significant in a regression analysis

2 Answers2

Linked