I have a dataset containing students test grades on different classes and personal information, as well as a label from the set {'abandoned college', 'graduated', 'master', 'phd'} which describes how far that student went on his academic life.
My goal was to generate a continuous score that is directly proportional to how far one student is likely to reach on his academic life. (e.g. on a 0-10 scale, a 1 student is unlikely to go to masters and a 9 student is very likely to go reach PhD).
(This idea was to compare this new metric with the GPA)
My initial approach was to solve this as a classification problem and use the classifier built in probabilities estimators as a score. But because it's a multiclass label, I end up with multiple probabilities. Also, those probabilities frequently are very sparse values.
Because the labels have an ordering, I thought of using a Regression instead, and just scale the predicted value. But, even though that works numerically I'm not sure it makes sense theoretically.
Is my goal even reachable, or is my problem ill defined? I've seen similar questions mentioning Ordered Regression, but I think it would not be enough (just better than ordinary classification). Should I combine Classification and Regression or just pick one?
UPDATE As suggested by a friend, I tried ensemble stacking. In a first layer I use two classifiers and in the second layer, a Regressor. This made no significant difference in the output (although it could be bad execution from my part) and I think that it makes a little bit more sense theoreticaly - the regressor is learning how to balance each classifier's opinion.
UPDATE2 It looks like L2R would be my best bet, but there's still a problem of missing data. Because each student has taken different classes, the only way to compare them directly would be to generate standard features commonly available. So the problem also imposes a problem of classification with missing features, which is hard on its own merit.