Repeating rare examples in unbalanced data classification

Question

So I'm trying to train a neural network for a rare event detection. Based on that, I have like 1000 times more examples for non-target (everything else) examples that I have for target examples. So I was wondering if I just repeat the set of target examples untill I get balanced examples, what effect would that have on my classification and generalisation performance ? Would I gain anything ? What price would I be paying this way ?

And in general what is the best/most common way to deal with these situations ? I can't do boosting or bagging as I can not afford to train several models. I have computational resource and memory restrictions and I will have only one model (e.g. one neural network) for desicion making at test time.

Thanks !

* My question is clearly what is the effect of duplicating examples of a target class by as many copies as required to balance the number of examples on both classes. More specifically I am using neural networks as the model. What do I gain or lose if I do such ?

No i have seen that post, it is not the question i'm asking. — Moalana, Sep 15 '14 at 16:51
Because your question does *seem* to be answered by the apparent duplicate, could you please edit your post to indicate how it differs from the other one and to point out which aspects of it have not been answered in the other thread? — whuber, Sep 15 '14 at 17:54
the title of the post already clarifies the difference in question. I couldn't make it more clear than this. — Moalana, Sep 15 '14 at 18:42

Repeating rare examples in unbalanced data classification

0 Answers0