How do you deal with imbalanced data when you're doing regression?

Asked Jan 06 '18 at 12:11

Active Jan 06 '18 at 16:53

Viewed 74 times

To describe my problem. I'm predicting the price of an item depending on some text and other features in an ad. The training data contains a bunch of cheap items, some medium price items and few expensive items.

I log1p the prices but even then it's no surprise that when the real price of an item is low my model gives good results. However when the item is expensive the prediction is very bad.

Is it possible to balance my dataset in a way? I tried oversampling with imblearn but even tiny amounts causes lots of overfit due to the text data I think.

edited Jan 06 '18 at 16:53

deemel

2,402
4
20
37

asked Jan 06 '18 at 12:11

Michael

1

your post seems to be focusing on the point estimate from the regression have you also considered the variance or standard deviation of the point estimates? I suspect the predictions on the expensive items have a high variance which is exactly what you would expect and want. – Lucas Roberts Jan 06 '18 at 14:17
1

Hard to answer a question about data without seeing the data. Include it, please. – Carl Jan 06 '18 at 16:16
Possible duplicate of [When is unbalanced data really a problem in Machine Learning?](https://stats.stackexchange.com/questions/283170/when-is-unbalanced-data-really-a-problem-in-machine-learning) – kjetil b halvorsen Dec 13 '18 at 23:56

How do you deal with imbalanced data when you're doing regression?

0 Answers0