0

I am talking about K-means clustering problem.

Theoretically, my understanding is :

If there are K clusters, the K-means algorithm outputs K centroids/means such that it minimizes total sum of SSE (Sum squared error) of all clusters.

Now, I was trying to understand how each iteration works. I referred 'an example' section at K-Means clustering .

This suggests the following procedure for finding the k means:

Make initial guesses for the means m1, m2, ..., mk

Until there are no changes in any mean

Use the estimated means to classify the samples into clusters. 
**(we can use a minimum-distance classifier to separate them.
That is, we can say that x is in cluster i if || x - mi || 
is the minimum of all the k distances)**

    For i from 1 to k
    Replace mi with the mean of all of the samples for cluster i
    end_for

end_until

I don't see, at any point, this iteration dealing with SUM of SSEs. It just considers each clusters independently.

So, can I say the above iteration process doesn't totally encapsulate the details of K-means algorithm. It is a sub-optimal approach which can work in some cases and NOT in some other cases?

wanderer
  • 211
  • 1
  • 8
  • What do you mean by "cumulative" in this instance and how does it differ from "summative"? – ttnphns Oct 25 '18 at 17:56
  • Cumulative means sum. They are used with same meaning. I have reworded it. – wanderer Oct 25 '18 at 18:07
  • Arithmetic mean (aka centroid) is the locus of minimal SS of deviations from it. To get the minimal SS is thus to set the origin to the cloud's centroid. Now, imagine we we have several clouds (clusters) in common space. We know, that SStotal=SSerror+SSbetween. So, to minimize SSerror maximize SSbetween cluster centroids. I.e. pull centroids far apart as possible. K-means iteratively assigns points to their closest means, i.e. it _tries_ to pull centroids apart. – ttnphns Oct 26 '18 at 11:37

0 Answers0