To describe my problem. I'm predicting the price of an item depending on some text and other features in an ad. The training data contains a bunch of cheap items, some medium price items and few expensive items.
I log1p the prices but even then it's no surprise that when the real price of an item is low my model gives good results. However when the item is expensive the prediction is very bad.
Is it possible to balance my dataset in a way? I tried oversampling with imblearn but even tiny amounts causes lots of overfit due to the text data I think.