Should outliers be removed if they follow the usual trend of the data and just happen to be extreme values?

Question

I'm working the loan borrowers dataset, I have detected univariate outliers according to the 1.5*IQR rule and I am investigating different features w.r.t whether increasing the value of feature X, a calculated parameter increases or decreases. I noticed that although there are extreme values in the features but, they follow the usual trend of the data, and in fact more vigorously.

Say, if income affected default rate with average 15 out of 100 people defaulting in 10-12 natural log of income, given the slope is positive for income, then in the outlier interval 12-14 the default rate increases to 20 out of 100 people, which would be as expected, if we ignore the fact for a while that they are not extreme values. There are other features where the difference b/w outlier and normal region is even more extreme, but they still follow the trend(IQR limit 40000, outlier 500000)

So should I remove these extreme values even though they provide useful information? Or should I keep these outliers even though they might skew up statistical analysis (in unknown ways)? More generally, how to think about this problem? What kind of investigation is needed before coming to a decision?

Should outliers be removed if they follow the usual trend of the data and just happen to be extreme values?

0 Answers0