Is there any difference between distant supervision, self-training, self-supervised learning, and weak supervision?

Question

From what I have read:

A Distant supervision algorithm usually has the following steps: 
1] It may have some labeled training data 
2] It "has" access to a pool of unlabeled data 
3] It has an operator that allows it to sample from this unlabeled 
   data and label them and this operator is expected to be noisy in its labels 
4] The algorithm then collectively utilizes the original labeled training data
    if it had and this new noisily labeled data to give the final output.

Self-training:

enter image description here

Self-learning (Yates, Alexander, et al. "Textrunner: open information extraction on the web." Proceedings of Human Language Technologies: The Annual Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations. Association for Computational Linguistics, 2007.):

The Learner operates in two steps. First, it automatically labels its own training data as positive or negative. Second, it uses this labeled data to train a Naive Bayes classifier.

Weak Supervision (Hoffmann, Raphael, et al. "Knowledge-based weak supervision for information extraction of overlapping relations." Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-Volume 1. Association for Computational Linguistics, 2011.):

A more promising approach, often called “weak” or “distant” supervision, creates its own training data by heuristically matching the contents of a database to corresponding text.

It all sounds the same to me, with the exception that self-training seems to be slightly different in that the labeling heuristic is the trained classifier, and there is a loop between the labeling phase and the classifier training phase. However, Yao, Limin, Sebastian Riedel, and Andrew McCallum. "Collective cross-document relation extraction without labelled data." Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2010. claim that distant supervision == self training == weak supervision.

Also, are there other synonyms?

Interesting question. Might this belong on Data Science? – goangit Nov 29 '14 at 21:57 — goangit, Nov 29 '14 at 21:57
@goangit Probably, like a good chunk of this website ;) – Franck Dernoncourt Nov 30 '14 at 04:34 — Franck Dernoncourt, Nov 30 '14 at 04:34

TenaliRaman · Accepted Answer · 2014-11-29T21:13:50.410

There are two aspects to all the different terms you have given: 1] Process of obtaining training data 2] Algorithm that trains $f$ or the classifier

The algorithm that trains $f$, regardless of how the training data is obtained is supervised. The difference in distant supervision, self-learning, self-supervised or weak supervision, lie purely then in how the training data is obtained.

Traditionally, in any machine learning paper on supervised learning, one would find that the paper implicitly assumes that the training data is available and for what its worth, it is usually assumed that the labels are precise, and that there is no ambiguity in the labels that are given to the instances in the training data. However, with distant/weak supervision papers, people realized that their training data has imprecise labels and what they want to usually highlight in their work is that they obtain good results despite the obvious drawback of using imprecise labels (and they may have other algorithmic ways to overcome the issue of imprecise labels, by having additional filtering process etc. and usually the papers would like to highlight that these additional processes are important and useful). This gave rise to the terms "weak" or "distant" to indicate that the labels on the training data are imprecise. Note that this does not necessarily impact the learning aspect of the classifier. The classifier that these guys use still implicitly assumes that the labels are precise and the training algorithm is hardly ever changed.

Self-training on the other hand is somewhat special in that sense. As you have already observed, it obtains its labels from its own classifier and has a bit of a feedback loop for correction. Generally, we study supervised classifiers under a slightly large purview of "inductive" algorithms, where the classifier learnt is an inductive inference made from the training data about the entire data. People have studied another form, which we call as transductive inference, where a general inductive inference is not the output of the algorithm, but the algorithm collectively takes both training data and test data as input and produces labels on the test data. However, people figured why not use transductive inference within inductive learning to obtain a classifier with larger training data. This is simply referred to as induction with unlabeled data [1] and self-training comes under that.

Hopefully, I have not further confused you, feel free to comment and ask for more clarifications if necessary.

[1] Might be useful - http://www.is.tuebingen.mpg.de/fileadmin/user_upload/files/publications/pdf2527.pdf

Thanks, your answer is very interesting! How about self-learning? Same as distant / weak supervision? — Franck Dernoncourt, Nov 30 '14 at 04:31
Yes. I don't particularly see a difference between self learning and distant/weak supervision, since the labels are obtained separately from an imprecise source and then fed to a supervised classifier. — TenaliRaman, Nov 30 '14 at 09:15
The differentiation of transduction compared to induction, according to the slides from Tübingen, seems to be the same the differentiation between eager learning and lazy learning. Or is there a difference? — Make42, Jun 24 '20 at 17:26
*induction with unlabeled data* is basically what the imputation method MICE uses, I guess: “sequential regression multiple imputation” https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3074241/ It also seems to me that self-training is form of "semi-supervised learning". What is the difference of that compared to *induction with unlabeled data"? — Make42, Jun 24 '20 at 17:30
I have learned that self-learning is something that e.g. autoencoders do, which would be different than the self-learning described here. — Make42, Jun 24 '20 at 17:34
@Make42 That is not quite self-learning, it is called self-supervision (that is the supervised data is created from the unsupervised data itself). Another example for that is word2vec, where for every word you create a positive and negative word set and build a classifier. These are all examples of self-supervision, which is quite different from what is generally understood to be self-learning. — TenaliRaman, Jun 25 '20 at 17:59
After reading other articles, e.g. http://ai.stanford.edu/blog/weak-supervision/, I understand that weak supervision means that labels contain uncertainty and that one type of weak supervision is distant supervision, namely that the labels are produced by another auxiliary mechanic (in contrast to non-expert human labelers). Right so far? So, self-learning sounds identical to distant supervision... is there a difference? What is it? Also, since https://ai.stackexchange.com/a/10624/38174 was confusing reg. self-supervision I opened https://ai.stackexchange.com/q/22176/38174... your 50ct there? — Make42, Jun 25 '20 at 20:06
2) Does self-supervision *require* an auxiliary task to be solved (instead of directly solving the actual task we are interested in)? 3) Does self-supervision always contain a distant supervision component? (Both an auxiliary task and a distant supervision component are involved in Word2Vec, Autoenconders, and GANs.) — Make42, Jun 25 '20 at 20:15

Is there any difference between distant supervision, self-training, self-supervised learning, and weak supervision?

1 Answers1