Supervised clustering or classification?

Question

The second question is that I found in a discussion somewhere on the web talking about "supervised clustering", as far as I know, clustering is unsupervised, so what is exactly the meaning behind "supervised clustering" ? What is the difference with respect to "classification" ?

There are many links talking about that:

http://www.cs.uh.edu/docs/cosc/technical-reports/2005/05_10.pdf

http://books.nips.cc/papers/files/nips23/NIPS2010_0427.pdf

http://engr.case.edu/ray_soumya/mlrg/supervised_clustering_finley_joachims_icml05.pdf

http://www.public.asu.edu/~kvanlehn/Stringent/PDF/05CICL_UP_DB_PWJ_KVL.pdf

http://www.machinelearning.org/proceedings/icml2007/papers/366.pdf

http://www.cs.cornell.edu/~tomf/publications/supervised_kmeans-08.pdf

http://jmlr.csail.mit.edu/papers/volume6/daume05a/daume05a.pdf

etc ...

@AtillaOzgur there are many links talking about supervised clustering, I added some of them to my post: [1]: http://www.cs.uh.edu/docs/cosc/technical-reports/2005/05_10.pdf [2]: http://books.nips.cc/papers/files/nips23/NIPS2010_0427.pdf [3]: http://engr.case.edu/ray_soumya/mlrg/supervised_clustering_finley_joachims_icml05.pdf [4]: http://www.public.asu.edu/~kvanlehn/Stringent/PDF/05CICL_UP_DB_PWJ_KVL.pdf [5]: http://www.machinelearning.org/proceedings/icml2007/papers/366.pdf [6]: http://jmlr.csail.mit.edu/papers/volume6/daume05a/daume05a.pdf — shn, Sep 26 '12 at 08:58
"Clustering" is synonymous to "unsupervised classification", therefore, "supervised clustering" is an oxymoron. One could argue though that Self Organising Maps are a supervised technique used for unsupervised classification, which would be the closest thing to "supervised clustering". — Digio, Aug 20 '15 at 08:41
As far as i have understood yet is "We use clustering to arrange the data to make it ready for further processing or at least to make it ready for analyzing further" so what we do in clustering is divide the data into Class A, B, C and so on...So now this data is supervised in some manner. Now it depends upon the requirement what you want to do with this data or what how can this data is useful to you whether for Classification operations or Regression one's. Correct me if i am wrong. — sak, Aug 07 '19 at 17:27

score 21 · Answer 1 · answered Oct 10 '12 at 16:07

21

I don't think I know more than you do, but the links you posted do suggest answers. I'll take http://www.cs.cornell.edu/~tomf/publications/supervised_kmeans-08.pdf as an example. Basically they state: 1) clustering depends on a distance. 2) successful use of k-means requires a carefully chosen distance. 3) Given training data in the form of sets of items with their desired partitioning, we provide a structural SVM method that learns a distance measure so that k-means produces the desired clusterings. In this case there is a supervised stage to the clustering, with both training data and learning. The purpose of this stage is to learn a distance function so that applying k-means clustering with this distance will be hopefully optimal, depending on how well the training data resembles the application domain. All the usual caveats appropriate to machine learning and clustering still apply.

Further quoting from the article: Supervised clustering is the task of automatically adapting a clustering algorithm with the aid of a training set consisting of item sets and complete partitionings of these item sets.. That seems a reasonable definition.

answered Oct 10 '12 at 16:07

micans

1,689
8
11

2

The problem is simply: why do you want to learn a distance measure from a set of labelled training data, and then apply this distance measure with a clustering method; why you would not just use a supervised method. In other words, you want to do clustering (i.e. partitioning your dataset into clusters), but you assume that you already have the complete desired partitioning and that you will use it to learn a distance measure, then apply clustering on this dataset using this learned distance. At best, you'll get the same partitions that you used to learn the distance measure ! You already have – shn Oct 10 '12 at 17:58
Where you write "then apply clustering on this datase" substitute "then apply clustering on similar datasets". It is this scenario: in experiment X we have data A and B. A is for clustering, B helps with learning the distance. B sets a gold standard and is presumably expensive to obtain. In subsequent experiments X2, X3 .. we obtain A but cannot afford to obtain B. – micans Oct 11 '12 at 09:10
Ok, now when you say "learning a distance" from a dataset B: do you mean "learning some distance threshold value" or "learning a distance metric function" (a sort of parametrised dissimilarity measure) ? – shn Oct 11 '12 at 11:17
1

I mean the second, "learning a distance metric function". Upon more reading by the way, my simple A and B formulation above can be found in the quoted manuscript: "Given training examples of item sets with their correct clusterings, the goal is to learn a similarity measure so that future sets of items are clustered in a similar fashion." – micans Oct 11 '12 at 11:30
1

Well, it seems then that "supervised clustering" is very similar to what is called "semi-supervised clustering". Until now, I don't really see any difference. By the way, in some other papers, the "(semi-)supervised clustering" do not refer to "creating a modified distance function" to be used to cluster future datasets in a similar fashion; it is rather about "modifying the clustering algorithm itself" without changing the distance function ! – shn Oct 11 '12 at 11:48
This is not my field, so I don't know whether these definitions are firmly established. According to the quoted paper, "A related field is semi-supervised clustering, where it is common to also learn a parameterized similarity measure [3, 4, 6, 15]. However, this learning problem is markedly different from supervised clustering. In semi-supervised clustering, the user has a single large dataset to cluster, with incomplete information about clustering, usually in the form of pairwise constraints about cluster membership. This difference leads to very different algorithms in the two settings." – micans Oct 11 '12 at 11:55
@shn, I think as the chosen answer suggests, clustering (unsupervised/supervised/semi-supervised) has the advantage that it can adapt to any number of classes depending on the data, while in supervised classification we may identified a few fixed set of classes and the algorithm has to learn and produce "mappings" to only one of these (however you can mix these ideas, e.g. "learn distance function" in a supervised way, and use to cluster points; this would still have the ability to cluster data into sets whose categories have not been identified in the ground truth used for supervision) – np20 Oct 05 '17 at 22:47

score 6 · Answer 2 · answered Oct 12 '12 at 06:12

Some definitions:

Supervised clustering is applied on classified examples with the objective of identifying clusters that have high probability density to a single class.

Unsupervised clustering is a learning framework using a specific object functions, for example a function that minimizes the distances inside a cluster to keep the cluster tight.

Semi-supervised clustering is to enhance a clustering algorithm by using side information in clustering process.

Advances in Neural Networks -- ISNN 2010

Without using too much jargon since I'm a novice in this area, the way I understand the supervised clustering is more the less like this:

In supervised clustering you start from the Top-Down with some predefined classes and then using a Bottom-Up approach you find which objects fit better into your classes.

For example, you performed an study regarding the favorite type of oranges in a population.
From the many types of oranges you found that a particular 'kind' of oranges is the preferred one.
However, that type of orange is very delicate and labile to infections, climate change and other environmental agents.
So you want to cross it over with other species that is very resistant to those insults.
Then you go to the lab and found some genes that are responsible for the juicy and sweet taste of one type, and for the resistant capabilities of the other type.
You perform several experiments and you end with let's say hundred different subtypes of oranges.
Now you are interested just in those subtypes that fit perfectly the properties described.
You don't want to perform the same study in your population again...
You know the properties you are looking for in your perfect orange.
So you run your cluster analysis and select the ones that fit best your expectations.

This is a nice answer but fails to define what Classification is. — adunaic, May 17 '20 at 21:34

score 5 · Accepted Answer · answered Oct 10 '12 at 13:22

My naive understanding is that classification is performed where you have a specified set of classes and you want to classify a new thing/dataset into one of those specified classes.

Alternatively, clustering has nothing to start with and you use all the data (including the new one) to separate into clusters.

Both use distance metrics to decide how to cluster/classify. The difference is that classification is based off a previously defined set of classes whereas clustering decides the clusters based on the entire data.

Again my naive understand is that supervised clustering still clusters based on the entire data and thus would be clustering rather than classification.

In reality i'm sure the theory behind both clustering and classification are inter-twinned.

I humbly disagree. You're suggesting that "classification" is by definition and by default a supervised process, which is not true. Classification is divided into supervised and unsupervised cases, the latter being synonymous to clustering. — Digio, Aug 20 '15 at 08:46

score 0 · Answer 4 · answered Jul 30 '20 at 17:04

My interpretation has to do with the number of training samples you have per class.

If you have a lot of training samples per class, then you can reasonably train a classifier and you have a classification use case.

If you only have training samples for a fraction of the classes then a classifier would have poor performance, but a clusterer could be useful. You can optimize this clusterer with the labels you have (optimize the distance, features etc...) and hopefully this optimization will be useful on unlabelled data. You have a (semi) supervised clustering use case.

Supervised clustering or classification?

4 Answers4

Linked