What does the term "gold label" refer to in the context of semi-supervised classification?

Question

Throughout the Snorkel tutorial here https://github.com/HazyResearch/snorkel and in the team's related white paper there's references to "gold labels", but the term evades definition.

What are 'gold labels' in the semi-supervised classification context?

Thank you.

score 8 · Accepted Answer · answered Mar 14 '18 at 15:08

From https://hazyresearch.github.io/snorkel/blog/snark.html:

We call this type of training data weak supervision because it’s noisier and less accurate than the expensive, manually-curated “gold” labels that machine learning models are usually trained on. However, Snorkel automatically de-noises this noisy training data, so that we can then use it to train state-of-the-art models.

As I understand it, the goal of Snorkel is to generate a large set of synthetic training data for large-scale ML algorithms by learning from a much smaller set of hand-labeled training data. The hand-labeled training data have been handled by subject-matter experts and thus we are much more certain of the correctness of the label (but obtaining a large set of such data may be prohibitively expensive, hence the impetus for Snorkel in the first place). So it appears they are calling these hand-labeled data "gold" labels, as they represent some reliable ground-truth value. This can be contrasted with the labels output by the algorithm, which are hopefully of high quality but are still subject to noise by construction.

Thank you, I was needing a source to confirm it and you've provided. Do you know if this is also an industry-wide term? — raldy, Mar 14 '18 at 20:48
@raldy Oftentimes the phrase "gold standard" is used to refer to a measurement system whose outputs are known to be accurate and trustable. So I assume the terminology they use here is a slight offshoot of that, but i can't be 100% sure. If you think my post adequately answered your question, please consider marking this as the "accepted" answer by clicking the little checkmark to the left side of the post. — klumbard, Mar 14 '18 at 20:54
Done. If you are interested in feedback, your provided overview of Snorkel and comparison of the value of different sources of supervision was beyond the scope of the question and information I already had. — raldy, Mar 14 '18 at 23:52

What does the term "gold label" refer to in the context of semi-supervised classification?

1 Answers1