How to evaluate k-means considering initial conditions when having the ground truth?

Question

I use kernel k-means algorithm with different kernels and want to see which one is the best. The way i do it is to fix the number of $K$ equal to number of classes (ground truth) and check the accuracy of the clustering result comparing to the true labels.

I also try it 1000 times with different initial point to take the best clustering result. But i think this evaluation is so supervised as i take the best initial point based on my ground truth knowledge.

Isn't it better to split into test/train batches and cluster the test data according to cluster centers obtained from the most accurate clustering on train data?

Bob, let me recommed you to start trom my comment [here](http://stats.stackexchange.com/q/245364/3277): it directs to a thread where further relevant links are found. — ttnphns, Nov 12 '16 at 07:04
@ttnphns: Thank you for referring to that link which is so useful. Again regarding my problem, what about methods which are sensitive to the initial point? Do you think it is a good idea to use a cross-validity and for each of its data folds we try the algorithm multiple times on the train data to find the best initial point and then applying that to the test data in the fold? — Bob, Nov 12 '16 at 14:14
If all you care about is obtaining the best possible clustering result for an application, I think it's fine to "supervisedly" adjust the initial condition. However, if you're comparing your approach with other approaches, you should take the mean result over several runs with random initialization. Otherwise we could not be sure whether your results are due to your algorithm or just because you got lucky. — felipeduque, Feb 19 '17 at 20:09

How to evaluate k-means considering initial conditions when having the ground truth?

0 Answers0