I have data which has 3 million samples and unbalanced label.
I have tried many neural network approaches, but I couldn't get a good result.
Which path do you suggest me to follow in this case, in order to be successful?
Thanks,
I have data which has 3 million samples and unbalanced label.
I have tried many neural network approaches, but I couldn't get a good result.
Which path do you suggest me to follow in this case, in order to be successful?
Thanks,
The main reason analysts have trouble with unbalanced cases is that they are using improper accuracy scoring rules in their optimization procedure. If you try to use a probability estimation method (e.g., logistic regression) and you choose a proper objective function (e.g., the likelihood) you will not have that problem.
You can try using stratified cross validation [1]. There are also some other suggestions such as in [2]. These of course don't guarantee success, but can be used to solve issues related to unbalanced labels.