How to approach logistic regression on skewed dataset

Asked Oct 14 '14 at 16:54

Active Oct 14 '14 at 16:54

Viewed 71 times

I have a dataset with about 1M negative examples and 4700 positive examples. I'm trying to create a classifier that tries to predict the % of an example being positive.

Given how much the data is skewed, should I just give up or are there algorithms that perform well with skewed data?

asked Oct 14 '14 at 16:54

soulnafein

Take a look at http://gking.harvard.edu/category/research-interests/methods/rare-events. – dimitriy Oct 14 '14 at 17:46
See [this](http://stats.stackexchange.com/questions/6067/does-an-unbalanced-sample-matter-when-doing-logistic-regression) question, or [this](http://stats.stackexchange.com/questions/24245/low-probability-levels-when-doing-logistic-regression) one or [this](http://stats.stackexchange.com/questions/66753/do-i-need-a-balanced-sample-50-yes-50-no-to-run-logistic-regression) one. – Glen_b Oct 14 '14 at 18:27

How to approach logistic regression on skewed dataset

0 Answers0