0

I'm training a classification model for highly unbalanced data where hits are pairs that have similarity of almost 1 on independent metric. Everything else are non hits. Does it make sense to pick negative samples from pairs which are at least to some extent similar? Will it make the model more powerful?

Sergej Andrejev
  • 173
  • 1
  • 6
  • How limiting the information available for your model (throwing away data) could make it more powerful? – Tim Feb 12 '18 at 19:50
  • My class imbalance is 30 000 hitw to 1 000 000 non hits.i can get about 50/50 if I will use only non-whites which are somewhat similar on this independent network – Sergej Andrejev Feb 12 '18 at 21:08

0 Answers0