I have 30 observations and 60 variables. I conducted a k-means cluster in R with 5 clusters. If I am supposed to choose only 10 variables to show that they have the impact on creating clusters more than other variables, what criteria should I consider?
Asked
Active
Viewed 126 times
2
-
3Can you post the data? What do you mean "I am supposed to choose..."? Is this an assignment? – gung - Reinstate Monica Jan 16 '17 at 17:38
-
2In k-means, every cluster has exactly one center! – Has QUIT--Anony-Mousse Jan 22 '17 at 13:47
1 Answers
-1
First, I am not quite understand your overall approach. You built a model / (a clustering results) from data. But how good it is? (Why 5 clusters?). Without verifying the results are "good", why we should check with variables are contribute more on the unverified results?
Let's assume you checked the clustering results are good and just want to know which variables contribute more on the cluster center. I would first check the scale of the variables. If you are not scaling the variables, usually variables in large scale will be more important. Details can be found here.