Questions tagged [semi-supervised-learning]

Semi-supervised learning refers to machine learning tasks using a mix of labeled and unlabeled data. The goal is to learn a mapping from inputs to outputs, or to obtain outputs for particular unlabeled inputs. The unlabeled data is used to learn about underlying structure of the inputs, which can improve learning about the relationship between inputs and outputs. Semi-supervised learning involves elements of both supervised and unsupervised learning.

140 questions
30
votes
1 answer

Distant supervision: supervised, semi-supervised, or both?

"Distant supervision" is a learning scheme in which a classifier is learned given a weakly labeled training set (training data is labeled automatically based on heuristics / rules). I think that both supervised learning, and semi-supervised…
29
votes
3 answers

Unsupervised, supervised and semi-supervised learning

In the context of machine learning, what is the difference between unsupervised learning supervised learning and semi-supervised learning? And what are some of the main algorithmic approaches to look at?
28
votes
2 answers

What's the intuition behind contrastive learning or approach?

Maybe a noobs query, but recently I have seen a surge of papers w.r.t contrastive learning (a subset of semi-supervised learning). Some of the prominent and recent research papers which I read, which detailed this approach are: Representation…
23
votes
2 answers

What is the manifold assumption in semi-supervised learning?

I am trying to figure out what the manifold assumption means in semi-supervised learning. Can anyone explain in a simple way? I cannot get the intuition behind it. It says that your data lie on a low-dimensional manifold embedded in a…
21
votes
4 answers

"Semi supervised learning" - is this overfitting?

I was reading the report of the winning solution of a Kaggle competition (Malware Classification). The report can be found in this forum post. The problem was a classification problem (nine classes, the metric was the logarithmic loss) with 10000…
21
votes
3 answers

How to predict outcome with only positive cases as training?

For the sake of simplicity, let's say I'm working on the classic example of spam/not-spam emails. I have a set of 20000 emails. Of these, I know that 2000 are spam but I don't have any example of not-spam emails. I'd like to predict whether the…
19
votes
4 answers

Why does using pseudo-labeling non-trivially affect the results?

I've been looking into semi-supervised learning methods, and have come across the concept of "pseudo-labeling". As I understand it, with pseudo-labeling you have a set of labeled data as well as a set of unlabeled data. You first train a model on…
R.M.
  • 840
  • 5
  • 18
13
votes
1 answer

Is there any difference between distant supervision, self-training, self-supervised learning, and weak supervision?

From what I have read: Distant supervision: A Distant supervision algorithm usually has the following steps: 1] It may have some labeled training data 2] It "has" access to a pool of unlabeled data 3] It has an operator that allows it to sample…
12
votes
3 answers

Classification with partially "unknown" data

Suppose I want to learn a classifier that takes a vector of numbers as input, and gives a class label as output. My training data consists of a large number of input-output pairs. However, when I come to testing on some new data, this data is…
9
votes
2 answers

How to find weights for a dissimiliarity measure

I want to learn (deduce) attribute weights for my dissimilarity measure that I can use for clustering. I have some examples $(a_i,b_i)$ of pairs of objects that are "similar" (should be in the same cluster), as well as some examples $(c_i,d_i)$ of…
8
votes
1 answer

What does the term "gold label" refer to in the context of semi-supervised classification?

Throughout the Snorkel tutorial here https://github.com/HazyResearch/snorkel and in the team's related white paper there's references to "gold labels", but the term evades definition. What are 'gold labels' in the semi-supervised classification…
raldy
  • 191
  • 1
  • 6
8
votes
4 answers

Semi supervised classification with unseen classes

Consider the following problem. You have a large dataset, some small subset of which have labels from the classes A, B and C. I would like to classify the unlabelled subset of items each of which can be from classes A, B and C or (crucially) also…
graffe
  • 1,799
  • 1
  • 22
  • 34
8
votes
2 answers

Incorporate new unlabeled data into classifier trained on a small set of labeled data

I have a set of 400 labeled samples (8 numeric features) on which I trained a binary classifier. The problem I am facing is that once the classifier is shipped to the users, I will get additional samples, but those will be unlabeled. I was…
user695652
  • 1,351
  • 3
  • 15
  • 22
7
votes
2 answers

Binary classification when many binary features are missing

I'm working on a binary classification problem, with about 1000 binary features in total. The problem is that for each datapoint, I only know the values of a small subset of the features (around 10-50), and the features in this subset are pretty…
raegtin
  • 9,090
  • 12
  • 48
  • 53
7
votes
0 answers

Computation of log-likelihood in semi-supervised naive bayes

I have the following 2 questions about log-likelihood computation in semi-supervised Naive Bayes. I have read on several documents online that, in every EM iteration of the semi-supervised Naive Bayes, log-likelihood is positive. Is this always…
1
2 3
9 10