Binary Classification of Multiple Groups

Question

I've ran across a type of classification problem that I don't think fits into the traditional multi-class framework. Just wanted to run it across you guys to see if you had any ideas. So

Lets say we have 3 drugs - drug A, B, C - and we serve it to 100 patients

The for each patient we have three types of information:

Patient attributes
Type of drug given (A,B, or C)
Patient survival (dead or alive)

This is our data set. For future patients, we want to be able to give them the drug (either A,B, or C) that maximizes their chance of survival.

I'm coming from more of a machine learning/data analysis background, so I'm trying to figure out how to use the data to solve this problem. Typically, in a classification problem, I'd just set the patient survival as the class variable and all of the demographic data as my input features. However, now the class label should be the "type of drug given" but each drug is its own separate binary classification problem.

One idea would be to just use the data for the "alive" patients - and reduce it to a multiclass problem (with class labels A,B and C). The rationale behind that approach is that you are trying to maximize survival of future patients, so the patients that have survived from drug X previously will give you more information than those who died from drug X. The major problem with this approach is that you are not using a large portion of the data.

Any ideas?

Why not just use a logistic regression with Dead/Alive as the dependent and have a dummy variable for Drug A,B,C with your patient characteristics as covariates, which will give you an odds ratio? — , Mar 05 '12 at 20:29
Normally we are also interested in survival times, as we tend to have censored data due to follow-up, so you could do a survival analysis if you are interested in survival rates over time. This will mean that duration of follow-up is not confounded with survival status, which it could if a regression is used instead. — Michelle, Mar 05 '12 at 20:36
@rosser Also had this idea, but I cannot get rid of the feeling that if one treats "drug" as a predictor just as any other patient attribute, then "drug" may be undervalued when the model is fitting the problem "globally". I cannot express this explicitly, I might be wrong. — mlwida, Mar 06 '12 at 13:10

Binary Classification of Multiple Groups

0 Answers0