I don't have very strong statistical background, and I'm new in data science...
Now, I am practicing PCA (Principle Component Analysis) for dimension reduction. This tutorial looks very complete, but one step I got confused. PCA Dimension Reduction Tutorial
Before they are using PCA in R or Python, all the categorical data has to be converted to numerical data. The tutorial is using one hot encoding, so that a column with different values will be separate into different columns. For example, if a column called Outlet_TypeSupermarket has 3 values Type 1, Type 2, Type 3 originally, after one hot encoding, it will become 3 columns Outlet_TypeSupermarket Type 1, Outlet_TypeSupermarket Type 2, Outlet_TypeSupermarket Type 3. They do this for each column. Then using PCA on all the generated columns.
Finally, in this case, even if PCA choses the most important 30 components (important columns), it maybe just using part of the original columns. For example, it may only use Outlet_TypeSupermarket Type 1, Outlet_TypeSupermarket Type 2 from the original Outlet_TypeSupermarket
Is this the right way to do dimension reduction? I thought the chosen columns would at least be complete columns from the original data set... If this is the correct way, could you tell me why?