4

I am running a series of hierarchical regressions with a lot of independent variables. All the IVs show a loose theoretical relationship to the DV. My supervisor has suggested excluding IVs from regression based on whether they correlate with the DV or not (if they don't, then they are out). It seems to be accepted enough to warrant studies using this technique to be published in high-end medical journals, but I can't find a reference directly supporting it.

I've also received some negative feedback regarding this process - namely, because some of the excluded IVs are correlated with the included IVs, it's been suggested that this will affect the potential coefficient for the ones left in the regression. And that including only those measures expected to be significant masks the likelihood that a few significant correlations will emerge by chance.

I have found reference suggesting that if the variables are not 'important' then excluding variables that correlate with other IVs is not a problem.

Can an IV's importance to a model be determined by whether they correlate with the DV or not?

kvella
  • 41
  • 1
  • 5

1 Answers1

3

An independent variable not correlated on its own with the dependent variable may be very important in a model. Multiple regression determines the relation of an independent variable to the dependent variable with all other variables taken into account. As a result such a variable might become significant in the multiple regression, or it might substantially influence the results for other variables. This page has some examples.

There are well established techniques for how to deal with multiple independent variables. Choices based on knowledge of the subject matter are very important. You also have to consider the purpose of your analysis: are you interested in making predictions or are you looking for insights about underlying mechanisms? The best way to proceed might be different. This page is a useful place to start.

EdM
  • 57,766
  • 7
  • 66
  • 187