In this article, Abou-Mustafa and Schuurmans proposed a method that makes it easy to decide what unsupervised learning algorithm generalizes 'better' to the entire dataset. In particular, this needs some external loss function l to measure reeconstruction errors, which for k-means clustering could be set to be L1 distance from each point to its nearest centroid. For my thesis, I'm searching for some method to compare k-means, spectral clustering and (H)DBSCAN density-based clustering. Would you have any hints on what kind of external loss function could be used for the latter two? Or if this is difficult (I see some problems with point-wise loss functions in density-based algorithms), are you aware of better ideas to compare these algorithms?
Asked
Active
Viewed 21 times