1

I have a D-dimensional dataset composed of exactly two clusters (this is known) for which I have no labels; the clusters can potentially be wildly imbalanced.

I'm after a soft (or fuzzy) clustering method to assign probabilities to each element of belonging to either cluster. So far I've been able to come up with basically two:

Then there are methods that apply hard clustering that could perhaps be softened by re-running varying the inputs (and averaging all the iterations?):

And finally there's also those methods that I'm not sure whether they can be applied in an unsupervised way at all:

Am I missing some method? Did I miss-classify any of the above? Is any method more suited to my particular issue than the rest?

Any insight will be much appreciatted.

Gabriel
  • 3,072
  • 1
  • 22
  • 49

1 Answers1

0

UPDATE

(Looking back on the OP, I would recommend running FKM and GMM, then try to publish the analysis based on that, or give a talk based on use of FKM/GMM. There is nothing wrong with use of FKM/GMM for your OP. You may be "missing" a lot other unknown/unpopular methods which develop probabilities -- so construction of list that misses nothing would become open-ended).

Only fuzzy k-means provides "membership function" values for each object, and GMM provides cluster-specific probabilities, $P(cluster|x)$. "Crisp" k-means cluster analysis (opposite of fuzzy, i.e., the typical k-means) can provide something like a cluster-specific probability if you assume that Euclidean distance to the closest centroid represents a probability. There may be more methods that do this, but generally speaking, most other unsupervised methods are based on distance metrics, not probabilities.

I wouldn't get "hung up," i.e., interested on being exact about all the various unsupervised methods and which ones do and don't provide cluster-specific probabilities. If you are writing a review, book chapter, or a report for school, you shouldn't ask this forum about it.

Have you grasped the full literature on this topic? You'd get a much better answer doing your own research.

  • see the updated answer - if a final list were required, it would be an open-ended question. –  Mar 15 '20 at 18:38
  • Also, did you run FKM and GMM yet, and submit the paper or give the talk? What did the reviewers (audience) say? –  Mar 15 '20 at 18:42