I am talking about K-means clustering problem.
Theoretically, my understanding is :
If there are K clusters, the K-means algorithm outputs K centroids/means such that it minimizes total sum of SSE (Sum squared error) of all clusters.
Now, I was trying to understand how each iteration works. I referred 'an example' section at K-Means clustering .
This suggests the following procedure for finding the k means:
Make initial guesses for the means m1, m2, ..., mk
Until there are no changes in any mean
Use the estimated means to classify the samples into clusters.
**(we can use a minimum-distance classifier to separate them.
That is, we can say that x is in cluster i if || x - mi ||
is the minimum of all the k distances)**
For i from 1 to k
Replace mi with the mean of all of the samples for cluster i
end_for
end_until
I don't see, at any point, this iteration dealing with SUM of SSEs. It just considers each clusters independently.
So, can I say the above iteration process doesn't totally encapsulate the details of K-means algorithm. It is a sub-optimal approach which can work in some cases and NOT in some other cases?