0

In the context of a ninary classification problem if all the predictors (independent variables) are strictly binary {0,1} then is there a specific way to preprocess the data ? Assume there are no missing or noisy data points.

More Info: I am currently working on a project with gene expressions data. The predictors are completely binary and I am trying classify genes into two categories "cancerous" or "non cancerous". I have tried using logistic regession, svm (linear and rbf) and random forests as suggested in this thread, however the auc for my application is about 0.65 for all the classifiers. I am trying to get the auc up. What approach should I be using ? I hope there is a specific way to handle this data and not this is not a situation where the inter-class sepration is low.

  • How come gene expression is binary? Maybe you mean that your response (cancer / no cancer) is binary? Then logistic regression would be one of the many ways you can analizę your data. Or do you want to predict gene expression from binary predictors (cancer / no cancer)? In that case, you can use A NOVA. – January Jul 22 '19 at 06:35
  • 1
    I mean my predictors or independent variables are binary. The "gene expression" data here is the output of a graph model and not traditional methods such as microarray where the data is continuous. So I would like to classify between non cancerous and cancerous genes in this context where all the predictors are binary valued. – Aditya Lahiri Jul 22 '19 at 22:11
  • Ah. I apologize. In that case – no, not as far as I can tell, you can directly plug in your binary predictors in the logistic model. – January Jul 23 '19 at 05:18
  • If all three models return similar results, then that score may be the best you can hope for with your data. – user2974951 Jul 23 '19 at 09:22

0 Answers0