I was going through a document of Western Michigan University to understand the limitations of K-means clustering algorithms. Below is the link:
https://cs.wmich.edu/alfuqaha/summer14/cs6530/lectures/ClusteringAnalysis.pdf
On slide no 33 its mentioned that K-means has problems when clusters are of different
- Sizes
- Densities
- Non globular shapes
Since we explore our data and try to figure out the different groups that are present in our data through the k-means clustering algorithm, how would we know that the size of the clusters is different beforehand? We can visualize it if we have two-dimensional data but how it can be done if the data is n-dimensional? Is there any way to examine the data before proceeding to apply k-means.
Also, the explanation for the limitation is: if we have different sizes of clusters, k-means will not give the desirable clusters as it tries to partition the clusters equally. But I don't think its always the case. I had applied k-means on the following dataset with k-means++ initialization
https://archive.ics.uci.edu/ml/datasets/online+retail
It gave me clusters with highly uneven distribution of 4346, 23, 3. I think I am missing some prerequisite steps before proceeding. Please help me clear my doubts. Thanks.