Appropriate reasons to exclude independent variables from regression

Question

I am running a series of hierarchical regressions with a lot of independent variables. All the IVs show a loose theoretical relationship to the DV. My supervisor has suggested excluding IVs from regression based on whether they correlate with the DV or not (if they don't, then they are out). It seems to be accepted enough to warrant studies using this technique to be published in high-end medical journals, but I can't find a reference directly supporting it.

I've also received some negative feedback regarding this process - namely, because some of the excluded IVs are correlated with the included IVs, it's been suggested that this will affect the potential coefficient for the ones left in the regression. And that including only those measures expected to be significant masks the likelihood that a few significant correlations will emerge by chance.

I have found reference suggesting that if the variables are not 'important' then excluding variables that correlate with other IVs is not a problem.

Can an IV's importance to a model be determined by whether they correlate with the DV or not?

Please clarify whether by "IV" you mean "Independent Variable" or "Instrumental Variable" — Alecos Papadopoulos, Jul 23 '15 at 09:20
Corrected and thanks. I'm referring to independent variables. — kvella, Jul 24 '15 at 06:10
I think you should try to ''google'' for ''omitted variable bias'' or e.g. read about that topic in D.N. Gujarati, "Basic Econometrics" — , Jul 28 '15 at 06:18

score 3 · Answer 1 · edited Apr 13 '17 at 12:44

An independent variable not correlated on its own with the dependent variable may be very important in a model. Multiple regression determines the relation of an independent variable to the dependent variable with all other variables taken into account. As a result such a variable might become significant in the multiple regression, or it might substantially influence the results for other variables. This page has some examples.

There are well established techniques for how to deal with multiple independent variables. Choices based on knowledge of the subject matter are very important. You also have to consider the purpose of your analysis: are you interested in making predictions or are you looking for insights about underlying mechanisms? The best way to proceed might be different. This page is a useful place to start.

Appropriate reasons to exclude independent variables from regression

1 Answers1