I've ran across a type of classification problem that I don't think fits into the traditional multi-class framework. Just wanted to run it across you guys to see if you had any ideas. So
Lets say we have 3 drugs - drug A, B, C - and we serve it to 100 patients
The for each patient we have three types of information:
- Patient attributes
- Type of drug given (A,B, or C)
- Patient survival (dead or alive)
This is our data set. For future patients, we want to be able to give them the drug (either A,B, or C) that maximizes their chance of survival.
I'm coming from more of a machine learning/data analysis background, so I'm trying to figure out how to use the data to solve this problem. Typically, in a classification problem, I'd just set the patient survival as the class variable and all of the demographic data as my input features. However, now the class label should be the "type of drug given" but each drug is its own separate binary classification problem.
One idea would be to just use the data for the "alive" patients - and reduce it to a multiclass problem (with class labels A,B and C). The rationale behind that approach is that you are trying to maximize survival of future patients, so the patients that have survived from drug X previously will give you more information than those who died from drug X. The major problem with this approach is that you are not using a large portion of the data.
Any ideas?