Classification predictions going wrong on fairly balanced dataset

Question

I have a binary classification problem and a fairly balanced dataset: 56% class 0 and 44% class 1. I trained the models randomforest, xgboost, and lgbm. I have categorically encoded, frequency encoded, ordinally encoded, and two numerical features in my dataset. I am getting good model accuracy, and good AUC and precision too – they are all in the high 90s. But when I look at the confusion matrix, I can see that the TP is way higher than the actual positives in the test set. And I have a similar case with TN. I have done the following:

Checked imbalance – I have a fairly balanced dataset
Feature encoding – Tried full integer encoding, as well as the above set up.
Tried different models, but didn't play with the parameters.

Is there anything that I can look at in terms of this problem?

@pplonski Please note that you must disclose your affiliation when you promote software in comments and answer. Otherwise, your comments and answers risk being removed as spam. — Sycorax, Apr 23 '21 at 16:51
Accuracy and related metrics aren't really a great way to evaluate models. https://stats.stackexchange.com/questions/312780/why-is-accuracy-not-the-best-measure-for-assessing-classification-models It's not clear that there's a problem here, other than the deficiencies of using a hard classification. — Sycorax, Apr 23 '21 at 16:54

Classification predictions going wrong on fairly balanced dataset

0 Answers0