I've a scaling problem. Let's say my target variable is a net revenue column and it has some range of (-34624455, 298878399). So the max-min value is 333502854.
Now in the test set, I have a record and it's revenue value is 2185 which when normalized, converts to 0.1038.
For this record, the predicted value when used a simple linear regression is 0.1037 (unlikely, but let's just assume). This converts to -40209.0402 which is no where near the actual value 2185. I understand that this is because of the crazy range that I've got, but how do I scale this sort of data ? I've tried removing the outliers thinking that it might help in reducing the effect of the range but even in the subset with no outliers, the range is pretty crazy and I still see the same effects where the predicted value in it's normalized/scaled form is close enough to the normalized/scaled actual but when I convert it to original scale, the data is not even close. What kind of scaling techniques should I use for this kind of data?
I used a simple scaling method for now which is (x-min)/(max-min)
Steps listed below:
2185 - (-34624455) = 34626640 # Subtracting the min value
34626640 / 333502854 = 0.103827117 # Dividing with the range
Assume the predicted value is 0.1037
0.1037*333502854 = 34584245.96 #multiplying with the range
34584245.96 + (-34624455) = -40209.0402 # Adding the min value
If I assume the predicted value to be 0.103827116 which is is exactly same as the actual value up until 8 point precision, then the invert scaled value is similar to the actual.
Hope this makes it a bit more clear the problem I am having. I am looking for some pointers on some more appropriate scaling methods as clearly, the min-max and standardized scaling technique are not working for this dataset.