0

I have a large data set (around 4600 rows and 10000 columns). First, I performed L2 normalization and did PCA to obtained 50 components. Then I performed t-SNE and obtained 2 components. I did the clustering of my samples with t-SNE, which looks nice; i.e. t-SNE is able to capture the clusters present in the data. Now I want to use these t-SNE components to train ML algorithms for classification and regression task. My question is:

1- Do we need to normalize these t-SNE components before fed into any ML algorithms 

2- I want to append additional features to these components. 
Do I need to rescale these additional features with t-SNE components? 

This https://stats.stackexchange.com/questions/369224/t-sne-on-principal-component-scores-standardization-needed ( great post) helped me to understand why PCA components should not be normalized before t-SNE. Can we apply the same reasoning for my question 1? What about the second one? Could you please help me to understand this?

Thanks.

hemanta
  • 1
  • 2
  • 1
    t-SNE was really designed as a visualization too. Caution is needed if using it for preprocessing, as it can introduce artifactual patterns (e.g. see [here](https://stats.stackexchange.com/questions/263539/clustering-on-the-output-of-t-sne)). Clustering t-SNE output in particular probably isn't a great idea. – user20160 Aug 25 '19 at 19:26
  • Thanks, user20160 for the suggestion. I mean I made the scatter plot of the t-SNE components and found clusters which I did not observe in scatter plot of different components obtained from PCA (4 component). – hemanta Aug 25 '19 at 20:04

0 Answers0