3

I want to do a k-means clustering on a dataset containing 22 numerical variables between 0 and 100 and 75 observations using R. I read this post How to understand the drawbacks of K-means on k-means clustering assumptions. My question are:

1- How can I check that my clusters are Spherical or not? having 22 variables, I cannot visualize them.

2- For checking if "all variables have the same variance" do I need to test them statistically?

3- For checking "the prior probability for all k clusters are the same, i.e. each cluster has roughly equal number of observations" what should I do if I do not expect my data to have clusters of almost the same size? In other words, it is natural for my data to have clusters of different sizes.

Hamideh
  • 207
  • 2
  • 10
  • 1
    With only 75 observations in 22 dimensions, you have a very sparse problem indeed. I'm afraid the [curse of dimensionality](https://en.wikipedia.org/wiki/Curse_of_dimensionality) may bite you, and your clusters may not be very meaningful. – Stephan Kolassa Jul 06 '15 at 15:45
  • There is no such an assumption as `all variables have the same variance` in K-means. The other two assumptions can hardly be tested in advance because you must first get the clusters to be able to check them. These points aren't "assumptions" in the narrow sense of the word; rather, it is the cluster habitus which K-means is prone to form. – ttnphns Jul 06 '15 at 16:02

0 Answers0