Given a dataset $D$ and a distance measure, I want to split the dataset into two disjoint subsets $X, Y$ of a specified size (say 80% and 20% of the original size), so that the minimum distance of all pairs $(x, y)$ with $x \in X$ and $y \in Y$ is maximized. I found references to max-margin clustering without the size constraint, however my feeling is that the constraint should make the problem easier - does it? Is there some straightforward way to solve this problem?
Asked
Active
Viewed 153 times
3
-
can you please cite these references? – mpiktas Mar 07 '11 at 08:07
-
@mpiktas - well, it was just the result of a google search on max-margin clustering, the "original" paper on it seems to be http://webdocs.cs.ualberta.ca/~dale/papers/nips04.ps.gz (Xu et al, NIPS 2004) – etarion Mar 07 '11 at 09:33
1 Answers
2
A bit vague answer, but I'll give it a chance. I always felt that the size constraint is the central idea behind this method -- without is seems just to converge to other approaches, effectively to 2-means and ideologically to unsupervised SVM. The previous rather invalidates this idea, the latter way is more intriguing while you may hope to save some pain using SVM optimization framework and kernel tricks.
-
I'm not quite sure I understand your answer. Did you mean that "the lack of a size constraint" is the central idea? (Then the rest of the answer makes sense to me, kind of) – etarion Mar 07 '11 at 09:36
-
@etarion No, rather that it is the method for solving a problem of clustering in two groups of a given size. And thus without the constant, it just seems to dissolve into u-s SVM. – Mar 07 '11 at 11:03