I learned that it's common to do dimensionality reduction before clustering.
But, is there any situation that it is better to do clustering first, and then do dimensionality reduction?
Asked
Active
Viewed 1.4k times
15

Itamar Mushkin
- 672
- 3
- 19

user145177
- 151
- 1
- 1
- 4
-
4We do not always do or need dimensionality reduction prior clustering. Reducing dimensions helps against curse-of-dimensionality problem of which euclidean distance, for example, suffers. On the other hand, important cluster separation might sometimes take place in dimensions with weak variance, so things like PCA may be somewhat dangerous to do. It is double-edged problem. – ttnphns Jan 14 '17 at 09:14
-
2Check some links: http://stats.stackexchange.com/q/29084/3277; http://stats.stackexchange.com/q/12853/3277; http://stats.stackexchange.com/q/157621/3277 – ttnphns Jan 14 '17 at 09:15
-
Thank you so much. I think I chose the wrong direction. : ) – user145177 Jan 15 '17 at 07:54
1 Answers
18
Clustering generally depends on some sort of distance measure. Points near each other are in the same cluster; points far apart are in different clusters. But in high dimensional spaces, distance measures do not work very well. There is a long and excellent discussion of that Here. You reduce the number of dimensions first so that your distance metric will make sense.
-
And...I know clustering method like K-means is not good in high dimensional spaces because of computation complexity, but I still wonder is there any special cases, it is better to do clustering before dimension reduction? Is it possible? Thank you.^^ – user145177 Jan 14 '17 at 05:04