Why isn't accuracy of binary classification model improving?

Question

I have a data set with a binary response variable, about 30,000 observations of 8 features, some are continuous and some are categorical.

This is an imbalanced data set, the ratio of negatives to positives is about 5:1. So the null accuracy (always predicting negatives) is ~84%.

I know that for imbalanced data sets accuracy is usually not a good metric. But in my context a high accuracy is desirable and because the imbalance is not extreme I think it is possible to improve. I would like to shoot for at least 90% accuracy.

I have tried various feature engineering techniques and machine learning models but I am not even able to hit 86%. For example, decision trees, logistic regression and random forests all give about 85.6-8% accuracy. I used cross-validation for finetuning hyperparameters and checked training accuracy to make sure there was no overfitting.

What could be the reasons for getting such a marginal improvement in accuracy over a dumb model?

"because the imbalance is not extreme I think it is possible to improve" - whether your data are imbalanced or not has no bearing on whether you can improve on a simple model. You can't improve on 50% accuracy for predicting a coin toss, no matter how much (balanced) data you have. You can't improve on 16.7% accuracy for predicting a die roll, no matter how much (imbalanced) data you have. — Stephan Kolassa, Jan 20 '19 at 17:19
Thanks for the link and comment Stephan. My point was that it's not like the null accuracy is 99.9%. Since you mentioned prediction of completely random events, are you implying that that the response variable is almost random with the current set of features? — Ali, Jan 20 '19 at 18:18
What leads you to believe that your features are strongly predictive of your outcome? — Sycorax, Jan 20 '19 at 18:49
Anyhow, accuracy is not a proper score function. See https://stats.stackexchange.com/questions/359909/is-accuracy-an-improper-scoring-rule-in-a-binary-classification-setting — kjetil b halvorsen, Jan 20 '19 at 20:08
Ali, what @stephen and sycorax are asking is why do you think you can do better? Are you sure the variables are suitably discrimatory ( no one else can decide that for you) — seanv507, Jan 20 '19 at 20:27
Sorry for the late response. Using other metrics such as AUC I was able to come up with a good model (AUC > 0.8). Thank you very much! — Ali, May 19 '19 at 20:19

Why isn't accuracy of binary classification model improving?

0 Answers0