Is there a standard way for training neural networks with negative-labeled data?

Question

I have a project (http://write-math.com) where I want to classify handwritten recordings into symbols. I get my data from a crowd-sourcing approach (with lots of filtering by hand, because people give obviously wrong labels).

For some of my data, I cannot say what the correct label would be. For example, a round shape could be \circ, O, o, 0, ...

But I can say that a round shape is NOT \sigma, \int, \infty, ...

Is there a standard way to learn from such negative labels with a multilayer Perceptron?

I guess I would have to implement a custom error function which gives no error for any output of the net of the possible values, but an error for anything different from 0 in the negative classes. Do you know any paper where somebody did this kind of learning?

Is there *any* way to distinguish `\circ`, `0`, `o` and the like later on? If not, I would just make classifiers for each distinct shape (e.g. circle). There's no point in defining classes you can't distinguish anyway. — Marc Claesen, Jun 10 '15 at 09:27
Yes, those classes can be distinguished in some cases. Some people make rather an ellipse for `O` and `o` rather a circle, for example. Also, with context it is sometimes possible to distinguish those (and I want to extend my system to complete formulas) — Martin Thoma, Jun 10 '15 at 09:33
Note that you can also obtain such functionality using a combination of a limited number of shape classes (e.g. circle, bar, ...) and some kind of context-aware decoder (e.g. HMM). Depending on the situation, this setup can be easier than building classifiers for many marginally distinct classes. — Marc Claesen, Jun 10 '15 at 09:46
I'm assuming you are looking for a classifier which output 1 to all similar shapes (e.g. circle, zero) and 0 to all others. You can use cross entropy function with sigmoid neurons instead of soft-max and label all similar classes to 1 and 0 for others. Hence, there is more than one correct answer to an input. In that way network attempts to increase props for possible outputs and decreases the props for false outputs. Did you try such a method? — yasin.yazici, Jun 10 '15 at 21:48

score 1 · Answer 1 · answered Sep 03 '18 at 00:43

The standard approach to classification problems with neural nets is to output a conditional distribution over classes. For example, in a softmax output layer, each unit corresponds to a possible class, and its activation gives the predicted probability that the input is a member of that class. Typically, the network is trained to minimize the cross entropy loss for a set of labeled training points.

In your case, training labels aren't completely known. But, knowing that a point may be a member of particular classes, and cannot be a member of others still carries important information that can be used to train the network. This information can be exploited using soft labels.

A soft label is a probability distribution over classes, rather than a specific value. So, if there are $K$ classes, a soft label for training point $x$ is a vector $p = [p_1, \dots, p_K]$, where $p_i$ represents a known probability that $x$ is a member of class $i$. For example, if $x$ is known to be a member of class $j$, then $p_j=1$ and all other $p_{i \ne j} = 0$. Or, suppose we know that $x$ must be a member of one of $m$ possible classes, but can't say which. By the principle of maximum entropy, we should set $p_i$ for each possible class to to $\frac{1}{m}$ and all others to zero. More specific information about class probabilities could also be represented, if known.

Given a training set with soft labels, a network can be trained by minimizing the cross entropy between the predicted class probabilities and the soft labels. This is identical to how a network is typically trained with ordinary, hard labels (which are a special case of soft labels where probabilities are either 0 or 1). But, the cross entropy loss must be computed properly to account for the soft labels (see here for details).

Is there a standard way for training neural networks with negative-labeled data?

1 Answers1

Linked