0

I'm wondering whether to use OneHotEncoding before using PCA.

I did some googling and it seems the answer is no, because PCA doesn't work well on binary data (This is the source)

I just want to make sure my current understanding, that :

  1. Numerical features should be standardized (by StandardScaler) before applying PCA
  2. Categorical and ordinal features should be encoded to Integers (by LabelEncoder)
  3. Related to the 2nd point : Categorical features shouldn't use OneHotEncoding (by get_dummies)

Please kindly enlighten me

Thanks for the help

Blaze Tama
  • 115
  • 1
  • 8
  • See https://stats.stackexchange.com/questions/16331/doing-principal-component-analysis-or-factor-analysis-on-binary-data. I believe that addresses (3). Question (1) is answered at https://stats.stackexchange.com/questions/53/pca-on-correlation-or-covariance. (2) is hard to follow: if you *don't* encode your data numerically, then how do you propose to perform any kind of numerical analysis? – whuber Oct 28 '17 at 15:16
  • To pt. 2. PCA is a method for quantitative variables, not categorical (nominal or ordinal). There exist Multiple correspondence analysis (MCA) which is like PCA, but is for nominal data. Categorical PCA (CatPCA) is a more general technique which allows ordinal variables and mixed-type variables. – ttnphns Oct 28 '17 at 16:13

0 Answers0