I read a lot of papers that test k-means with many datasets that are not normally distributed like the iris dataset and get good results. Since, I understand that k-means is for normally distributed data, why is k-means being used for non normally distributed data?
For example, the paper below modified the centroids from k-means based on a normal distribution curve, and tested the algorithm with the iris dataset that is not normally distributed.
nearly all inliers (precisely 99.73%) will have point to-centroid distances within 3 standard deviations () from the population mean.
Is there something that I'm not understanding here?
- Olukanmi & Twala (2017). K-means-sharp: Modified centroid update for outlier-robust k-means clustering
- Iris dataset