I run a rolling backward stepwise factor selection within each regression window with a Matrix of regressors X(137x481) and a vector Y(1x137). As you can see, the number of regressors is way higher than the number of datapoints
Looking at the rolling R^2 I noticed that is Always extremely high. This lead me to the conclusion that the model is overfitting the data.
The reason is probably related to the high number of covariates and the multicollinearity problems that comes with.
Why the stepwise regression has this problem when there are many data? Reducing the number of covariates, I noticed an improvements. How can I effectively avoid this problem while maintainig a rolling selection criteria based on statistical significance? Doing a before will make any sense?
Thank for your help