0

I run a rolling backward stepwise factor selection within each regression window with a Matrix of regressors X(137x481) and a vector Y(1x137). As you can see, the number of regressors is way higher than the number of datapoints

Looking at the rolling R^2 I noticed that is Always extremely high. This lead me to the conclusion that the model is overfitting the data.

The reason is probably related to the high number of covariates and the multicollinearity problems that comes with.

Why the stepwise regression has this problem when there are many data? Reducing the number of covariates, I noticed an improvements. How can I effectively avoid this problem while maintainig a rolling selection criteria based on statistical significance? Doing a before will make any sense?

Thank for your help

  • 1
    I'm seconding the vote to close as duplicate. If you feel, after reading the linked question, your question highlights points not covered there, please let us know. – Matthew Drury Aug 07 '17 at 17:50
  • I think the question is clear but does not fully shed light on the last point that is, how you can avoid the multicollinearity problem within a rolling framework and with the stepwise selection – Federico Frisaldi Aug 07 '17 at 20:16
  • 1
    To address that point, Federico, which indeed is the most novel and interesting one, you first need to explain what your "rolling framework" is attempting to accomplish. Do you believe that some variables belong in the model at some times and then at later times do not? By what mechanism should they be dropped and added? Should this be done suddenly or gradually as the window is moved? Or do you seek variables that ought to be part of the model for all windows simultaneously? – whuber Aug 07 '17 at 22:51

0 Answers0