-2

I have data which has 3 million samples and unbalanced label.

I have tried many neural network approaches, but I couldn't get a good result.

Which path do you suggest me to follow in this case, in order to be successful?

Thanks,

yusuf
  • 95
  • 4

2 Answers2

5

The main reason analysts have trouble with unbalanced cases is that they are using improper accuracy scoring rules in their optimization procedure. If you try to use a probability estimation method (e.g., logistic regression) and you choose a proper objective function (e.g., the likelihood) you will not have that problem.

Frank Harrell
  • 74,029
  • 5
  • 148
  • 322
1

You can try using stratified cross validation [1]. There are also some other suggestions such as in [2]. These of course don't guarantee success, but can be used to solve issues related to unbalanced labels.

1- Understanding stratified cross-validation

2- https://stats.stackexchange.com/a/133385/64720

erensezener
  • 220
  • 1
  • 5