I'm wondering whether to use OneHotEncoding before using PCA.
I did some googling and it seems the answer is no, because PCA doesn't work well on binary data (This is the source)
I just want to make sure my current understanding, that :
- Numerical features should be standardized (by
StandardScaler
) before applying PCA - Categorical and ordinal features should be encoded to Integers (by
LabelEncoder
) - Related to the 2nd point : Categorical features shouldn't use OneHotEncoding (by
get_dummies
)
Please kindly enlighten me
Thanks for the help