1

I am working on a logistic regression model. The ratio of 1s to 0s in the train dataset is 7:2 (14,000 1s and 4,000 0s)

The model performance is:

Accuracy - 83%

True Positive rate - 84%

True Negative rate - 74%

Area under the curve (ROC plot) is 0.78

Couple of questions:

  1. Do I have to treat the class imbalance in this case? I have read somewhere that the proportion of 0s to 1s does not matter, the number does. Is the number of rare events in this case (4,000) good enough?

  2. Is AUC of 0.78 acceptable? I have added all possible variables and adding more variables to the model is not an option.

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
  • Related: http://stats.stackexchange.com/questions/6067/does-an-unbalanced-sample-matter-when-doing-logistic-regression – ilanman Dec 25 '16 at 14:08
  • I think you will find the information you need in the linked thread. Please read it. If it isn't what you want / you still have a question afterwards, come back here & edit your question to state what you learned & what you still need to know. Then we can provide the information you need without just duplicating material elsewhere that already didn't help you. – gung - Reinstate Monica Sep 07 '17 at 16:05

0 Answers0