0

I am facing a binary classification problem that I don't know I should use more data or not.
I have one label 'A' with 10 training examples. And another label 'B' with also 10 training examples. By feeding these 20 examples to a multi-layer perceptrons, I can obtain a model which performs well on the classification task of both label 'A' and 'B'.
My question is that if I intentionally increase the number of training examples of label 'A' to 1000, will this move destroy the ability of the model to classify label 'B' since 10 training examples of label 'B' are enough to train a MLP model.
Any insights and materails are welcomed.

Lion Lai
  • 115
  • 4
  • 2
    See https://stats.stackexchange.com/questions/283170/when-is-unbalanced-data-really-a-problem-in-machine-learning but I'd say that the greatest problem with your data is that you have only 10 training examples of class 'B'. In your case I'd consider removing them from the data and using some kind of anomaly detection algorithm to classify the non-'A' classes. Moreover, with that small data nerual networks do not seem like a good choice of the algorithm. – Tim Nov 27 '17 at 07:48
  • @Tim: thanks for your quick response. In my case, I do really want to focus on label 'B' and 10 training examples are enough to train a model to predict label 'B'. So, what concerns me is why/will more training examples on another class sabotage a training process? – Lion Lai Nov 27 '17 at 08:00
  • 1
    This is answered in the link I provided. How do you know that the 10 cases is enough?How do you know that the model will learn anything and not overfit? – Tim Nov 27 '17 at 08:18

0 Answers0