The standard approach to classification problems with neural nets is to output a conditional distribution over classes. For example, in a softmax output layer, each unit corresponds to a possible class, and its activation gives the predicted probability that the input is a member of that class. Typically, the network is trained to minimize the cross entropy loss for a set of labeled training points.
In your case, training labels aren't completely known. But, knowing that a point may be a member of particular classes, and cannot be a member of others still carries important information that can be used to train the network. This information can be exploited using soft labels.
A soft label is a probability distribution over classes, rather than a specific value. So, if there are $K$ classes, a soft label for training point $x$ is a vector $p = [p_1, \dots, p_K]$, where $p_i$ represents a known probability that $x$ is a member of class $i$. For example, if $x$ is known to be a member of class $j$, then $p_j=1$ and all other $p_{i \ne j} = 0$. Or, suppose we know that $x$ must be a member of one of $m$ possible classes, but can't say which. By the principle of maximum entropy, we should set $p_i$ for each possible class to to $\frac{1}{m}$ and all others to zero. More specific information about class probabilities could also be represented, if known.
Given a training set with soft labels, a network can be trained by minimizing the cross entropy between the predicted class probabilities and the soft labels. This is identical to how a network is typically trained with ordinary, hard labels (which are a special case of soft labels where probabilities are either 0 or 1). But, the cross entropy loss must be computed properly to account for the soft labels (see here for details).