Choosing better samples for downsampling

Asked Feb 12 '18 at 19:41

Active Feb 12 '18 at 19:41

Viewed 45 times

I'm training a classification model for highly unbalanced data where hits are pairs that have similarity of almost 1 on independent metric. Everything else are non hits. Does it make sense to pick negative samples from pairs which are at least to some extent similar? Will it make the model more powerful?

asked Feb 12 '18 at 19:41

Sergej Andrejev

How limiting the information available for your model (throwing away data) could make it more powerful? – Tim Feb 12 '18 at 19:50
My class imbalance is 30 000 hitw to 1 000 000 non hits.i can get about 50/50 if I will use only non-whites which are somewhat similar on this independent network – Sergej Andrejev Feb 12 '18 at 21:08

Choosing better samples for downsampling

0 Answers0