0

I am preparing a dataset for KMean clusters. But a series of data appears to be bimodal:

enter image description here

My question is:

Can I perform KMeans on a bimodal data? If not, what kind of transformation can I perform to make the distribution more symmetrical?

Thanks!

Cheng
  • 233
  • 3
  • 8

1 Answers1

1

There are no restrictions to MacQueen's classic $K$-means clustering algorithm that require the data to be unimodal. So long as you can provide an initial partition of the items into K clusters (with bimodal data, perhaps $K=2$ initially?), and can calculate the distance between each observation and the centroid for each cluster at each step, you should be able to proceed without any problems.

StatsStudent
  • 10,205
  • 4
  • 37
  • 68
  • I am learning about kmeans, and being told that before input anything into kmeans, the data needs to be 1. log transformed (if it is skewed) 2. normalized. If the data is not skewed but bimodal, then I am not sure what to do. – Cheng Jan 01 '19 at 07:28
  • Who is telling you this? Normalization and and standardization can helpful in k-means, but are certainly not required pre-processing steps. If your variables are measured in different scales and you would like to compare them or if they have widely different variances, it's a good idea to normalize/standardize your data first. You may find this answer helpful to help you decide when it's appropriate: https://stats.stackexchange.com/questions/21222/are-mean-normalization-and-feature-scaling-needed-for-k-means-clustering. Log transformations can be helpful too at times (continued)... – StatsStudent Jan 01 '19 at 19:03
  • 1
    You should read ALL the answers here for a really nice explanation of assumptions of k-means and well as when it works well and when it doesn't and how to pre-process the data so that it performs well: https://stats.stackexchange.com/questions/133656/how-to-understand-the-drawbacks-of-k-means – StatsStudent Jan 01 '19 at 19:05
  • Thanks a lot for the links. I am going to study the links you have provided! – Cheng Jan 02 '19 at 23:46