What is the good method to select initial seeds in K-means?

Question

in text documents clustering when k-means using as base algorithm, and VSM is a matrix for doc-term weighted by tf-idf, what is the best metric can be used for select an optimal initialization points (seed points ) where clustering procedure starting from these points ?

k-means is mostly heuristic based and has many drawbacks in comparison to model based clustering. You should review http://stats.stackexchange.com/questions/133656/how-to-understand-the-drawbacks-of-k-means/133694#133694 — Jon, Mar 10 '17 at 16:32

score 2 · Accepted Answer · answered Mar 10 '17 at 16:37

2

A common approach is to use random initialization points, and run the algorithm multiple times, keeping the seed that minimizes your clustering error metric.

answered Mar 10 '17 at 16:37

Zach

22,308
18
114
158

What is the good method to select initial seeds in K-means?

1 Answers1