0

I have an imbalanced dataset for predicting bankrutptcy using the logit algorithm. My sample has 2%(200) bankrupt firms. Unfortunately my prediction is worthless with an auroc of 0.52. On top of that using the imbalanced dataset does not even explain 1% of the dependent variable. Strikingly, using (random) balanced data prediction power is much higher.

My data has enough bankrupt firms and is a good approximation of the full population so i wonder whether or not it is wise to compensate for imbalances in my sample? And very interesting: what is the advantage and disadvantage of compensating for imbalances ?

I hope someone can help me out.

Pat
  • 21
  • 1
  • 1
    Do any of these questions and answers relate, or how does you question differ? https://stats.stackexchange.com/questions/330927/handling-categorical-and-ordinal-data-with-highly-imbalanced-classes https://stats.stackexchange.com/questions/282002/what-is-the-best-statistical-method-for-assigning-cases-to-one-of-two-groups/282171#282171 https://stats.stackexchange.com/questions/107874/how-to-deal-with-a-skewed-class-in-binary-classification-having-many-features/121089#121089 – ReneBt Aug 24 '18 at 10:39

1 Answers1

0

You could try and tune the sensitivity and specificity to increase your accuracy: This post might help! How to select a threshold for logistic regression in case of imbalance in class distribution

Sjoseph
  • 101
  • 1