1

I have a set of possible candidates that I want to use in a multivariate regression. I am trying to reduce this set by the following procedure (using Stata):

Step 1: univariate regression (if significant Step 2 follows)

Step 2: regression using controls (if significant Step 3 follows)

Step 3: check certain hurdle rate (e.g. t>3 or p<0.05)

Step 4: group variables according to what they are supposed to measure, e.g. there are some variables that are supposed to measure macroeconomic state (so I suppose they are potentially related).

Step 5: Regress dependend var on independent var of a group, checking for multicollinearity and significance.

Step 6: use survivors of different groups for final multivariate regression.

Step 7: test multivariate regression in distinct sample period

By doing so, I will loose some promising variables as they might be only significant in combination with other variables. Do you have any hints if this is anyways a feasible approach or is there a more adequate approach out there that I could not find yet?

Many thanks for your answers, I hope that's a valid question and I gave enough input. If not, please let me know.

Best

Juliett

Juliett Bravo
  • 123
  • 2
  • 7
  • 3
    This is a manual version of forward selection. It may help you to read my answer here: [Algorithms for automatic variable selection](http://stats.stackexchange.com/a/20856/7290). – gung - Reinstate Monica Jun 26 '16 at 15:08
  • Edit your post to incorporate that information so that it doesn't stay buried in the comments. – gung - Reinstate Monica Jun 27 '16 at 12:51
  • So this problem seems to be inherent to the task and there is no perfect solution to it. Does this also hold for this case here, where we don't have a case of automatic but rather manual selection with expertise also coming into play? If the two selection criteria outlined above (univariate significance and significance using a fixed set of controls) hold and the set of candidates is small enough to also use qualitative judgment to explain their (non)significance, wouldn't this be a sensible approach? In particular, if the multivariate model is afterwards tested in a distinct sample? – Juliett Bravo Jun 27 '16 at 13:50

0 Answers0