I have 371 samples; each sample has 20,000+ attributes. The data are all numeric. I want to see whether my samples can be clustered, so I first decided to reduce my data with PCA. From PCA, I got 250 principal components that account for around 90% variance.
From this, I am thinking whether I can calculate my 371x371 distance matrix on these 250 principal components so that I can use it for hierarchical clustering. I tried to calculate on all 20,000 attributes for the distance matrix but it took too long. I think if I can use these 250 principal components, I can speed up my distance matrix calculation.
So my question is: can principal components be used for calculating a distance matrix for hierarchical clustering? Is this valid mathematically?
*Additional info: I have tried using principal components for clustering the iris data in R. The result is quite good although there is only 78% accuracy. I think it's because the dimension is really low (only 3), so if I reduce the dimension into 2 dimension, it cannot cover more than 90% of the variance.