Note: This is a revision of my original question.
I have read the critique of stepwise variable selection and "all possible subsets regression" by Professor Frank Harrell here.
Are factor analysis, principal component analysis, structural equation modelling selection and loading of variables more reliable than selection of best model from a set of dependent and independent variables by stepwise methods or "all possible subsets regression"? or they have similar problems.
Let's assume we have developed the following two models. (x5 is not correlated with y1).
M1 is the result of All possible subsets
To get M2 we do PCA and we choose the first 2 components and then we use those components in a two var regression.
I would like to interpret C1 and c2 and the two components that summarize the measured IVs and benefit from the reduction as the result.
- Will I overcome the problems of "all possible subsets regression"?
- Which one will have a better predictive validity?
- What are the comparative strength and weaknesses of the two models?