0

I have a binary classification problem and a fairly balanced dataset: 56% class 0 and 44% class 1. I trained the models randomforest, xgboost, and lgbm. I have categorically encoded, frequency encoded, ordinally encoded, and two numerical features in my dataset. I am getting good model accuracy, and good AUC and precision too – they are all in the high 90s. But when I look at the confusion matrix, I can see that the TP is way higher than the actual positives in the test set. And I have a similar case with TN. I have done the following:

  • Checked imbalance – I have a fairly balanced dataset
  • Feature encoding – Tried full integer encoding, as well as the above set up.
  • Tried different models, but didn't play with the parameters.

Is there anything that I can look at in terms of this problem?

The Pointer
  • 1,064
  • 13
  • 35
  • @pplonski Please note that you must disclose your affiliation when you promote software in comments and answer. Otherwise, your comments and answers risk being removed as spam. – Sycorax Apr 23 '21 at 16:51
  • Accuracy and related metrics aren't really a great way to evaluate models. https://stats.stackexchange.com/questions/312780/why-is-accuracy-not-the-best-measure-for-assessing-classification-models It's not clear that there's a problem here, other than the deficiencies of using a hard classification. – Sycorax Apr 23 '21 at 16:54

0 Answers0