0

Say I have 20 predictor vars (X's) and 1 response var (Y) and I'm attempting to build a supervised model y=f(x). Is it advisable or is it "OK" to firstly run PCA on all of the Predictor variables - and if say 3 PC's account for ~70% of the overall variance...can these 3 NEW PC variables be used as new predictor variables to the supervised learning.

Does it violate any rule - given that PCs are just linear combinations of all original predictor vars that do not correlate with other PCs...?

Paul.

PaulB.
  • 655
  • 3
  • 6
  • 10
  • 1
    I recommend you first to make a search on this site, because questions like this (`pca predictor varibles` `pca independent variablers`) have been answered several times already. – ttnphns Jul 13 '17 at 08:39
  • The comments answer the base question of being able to use the PCs for further analysis, but do give a thought to the interpretability of your supervised learning fits (particularly if it's regression), since it would now be transformed to the component space. – DivyaJyoti Rajdev Jul 13 '17 at 09:23

0 Answers0