0

I have a dataset which contains 95 highly correlated continuous variables and other 3 categorical variables. I want to reduce the dimension of the data and by that I can deal with correlation as well. I know that I can not apply PCA on categorical data as they do not have concept of variance. I read about Multiple Factor Analysis but I do not feel confident about it. Can I do PCA on continuous data to reduce dimensions and keep categorical data as they are?

The data are sensor data and my application is to predict the machine failure.

Thank you, Arch

Arch Desai
  • 11
  • 1
  • 2
    Re "Can I:" the software will do that as readily as anything else. In a comment to [your previous question](https://stats.stackexchange.com/questions/439356) I pointed out why this could lead to a poor analysis. – whuber Dec 05 '19 at 18:46
  • You write "I know that I can not apply PCA on categorical data " but are you sure? See (this thread)[https://stats.stackexchange.com/questions/5774/can-principal-component-analysis-be-applied-to-datasets-containing-a-mix-of-cont] for some insight on this (and some alternatives). – Peter Flom Dec 06 '19 at 14:41

1 Answers1

0

You may first divide your dataset based on continuous and categorical variables. Then apply PCA on a continuous part of your data and reduce your dimension for example to 10. Finally, merge with your categorical variables. In the end you have 13 dimensions dataset.

Batuhan B
  • 573
  • 2
  • 5
  • 13
  • Thank you. Is it a common practice in the industry as I did not see any examples online? – Arch Desai Dec 05 '19 at 20:24
  • I am not sure is it a common practice or not but I don't think there is a problem with this approach. You just create new features from continuous ones and combine with your categorical features. – Batuhan B Dec 05 '19 at 21:53