0

I have a dataset with about 1M negative examples and 4700 positive examples. I'm trying to create a classifier that tries to predict the % of an example being positive.

Given how much the data is skewed, should I just give up or are there algorithms that perform well with skewed data?

  • Take a look at http://gking.harvard.edu/category/research-interests/methods/rare-events. – dimitriy Oct 14 '14 at 17:46
  • See [this](http://stats.stackexchange.com/questions/6067/does-an-unbalanced-sample-matter-when-doing-logistic-regression) question, or [this](http://stats.stackexchange.com/questions/24245/low-probability-levels-when-doing-logistic-regression) one or [this](http://stats.stackexchange.com/questions/66753/do-i-need-a-balanced-sample-50-yes-50-no-to-run-logistic-regression) one. – Glen_b Oct 14 '14 at 18:27

0 Answers0