Coming from a machine learning background, I have long held the idea -throw in all variables and let regularization and cross validation fight against over-fitting.
The reason I am posting this arises from a recent study using principal component regression. Intuitively, increasing the PCR parameter (% of variance to keep) roughly amounts to removing variables. So a natural approach to take is throw in all variables and perform cross validation on the PCR parameter.
However, this approach proved sub-optimal; a later experiment showed that removing certain variables improved prediction across the board and makes prediction more stable across PCR parameter (shifting the learning curve up). This phenomenon still baffles me today.
Can anyone please comment on this, from a theoretical and applied perspective? In general, when you only care about prediction, would you consider variable selection?