I have a data set with a binary response variable, about 30,000 observations of 8 features, some are continuous and some are categorical.
This is an imbalanced data set, the ratio of negatives to positives is about 5:1. So the null accuracy (always predicting negatives) is ~84%.
I know that for imbalanced data sets accuracy is usually not a good metric. But in my context a high accuracy is desirable and because the imbalance is not extreme I think it is possible to improve. I would like to shoot for at least 90% accuracy.
I have tried various feature engineering techniques and machine learning models but I am not even able to hit 86%. For example, decision trees, logistic regression and random forests all give about 85.6-8% accuracy. I used cross-validation for finetuning hyperparameters and checked training accuracy to make sure there was no overfitting.
What could be the reasons for getting such a marginal improvement in accuracy over a dumb model?