0

in text documents clustering when k-means using as base algorithm, and VSM is a matrix for doc-term weighted by tf-idf, what is the best metric can be used for select an optimal initialization points (seed points ) where clustering procedure starting from these points ?

azifallail
  • 37
  • 1
  • 6
  • k-means is mostly heuristic based and has many drawbacks in comparison to model based clustering. You should review http://stats.stackexchange.com/questions/133656/how-to-understand-the-drawbacks-of-k-means/133694#133694 – Jon Mar 10 '17 at 16:32

1 Answers1

2

A common approach is to use random initialization points, and run the algorithm multiple times, keeping the seed that minimizes your clustering error metric.

Zach
  • 22,308
  • 18
  • 114
  • 158