Suppose I want to learn a classifier that takes a vector of numbers as input, and gives a class label as output. My training data consists of a large number of input-output pairs.
However, when I come to testing on some new data, this data is typically only partially complete. For example if the input vector is of length 100, only 30 of the elements might be given values, and the rest are "unknown".
As an example of this, consider image recognition where it is known that part of the image is occluded. Or consider classification in a general sense where it is known that part of the data is corrupt. In all cases, I know exactly which elements in the data vector are the unknown parts.
I'm wondering how I can learn a classifier that would work for this kind of data? I could just set the "unknown" elements to a random number, but given that there are often more unknown elements than known ones, this does not sound like a good solution. Or, I could randomly change elements in the training data to "unknown", and train with these rather than the complete data, but this might require exhaustive sampling of all combinations of known and unknown elements.
In particular I am thinking about neural networks, but I am open to other classifiers.