I was searching outlier removal in R and I saw some comments related to almost never you should remove outlier from dataset. I wonder when we should remove outlier? I have a dataset consisting outliers because of over pricing of sellers in house prices. I think they are making my data noisy.Should I remove them with R programming? I know that if I have large enough dataset maybe they become less important right? but now I have a small dataset with categorical and numeric values. And which methods in R I can use? Thank you
Asked
Active
Viewed 137 times
0
-
1I don't think that this is the right forum to answer this question. If and how to remove outliers is a rather complex topic that is discussed in books on data mining. As far as I know the decision is often subjective and it usually depends one the problem that is considered. In my opinion there is no cookie-cutter solution and it is generally not a matter of programming. – RHertel Jul 31 '15 at 10:25
-
Yes some people say that outliers shouldn't be removed from dataset some of them we should it is subjective. However, I saw these discussions at stackoverflow and I want to ask here because of that. @RHertel – tyer Jul 31 '15 at 10:40
-
1Since you have "outliers because of over pricing of sellers in house prices", I would suggest that you should use a model that can represent over-pricing. – Roland Jul 31 '15 at 11:28
-
I didn't understand exactly Can you give example? Do you mean I should use robust regression? @Roland – tyer Jul 31 '15 at 11:40
-
1Robust regression might be an option, but I meant that you could use a model that explicitly models over-pricing. Sorry, I don't have time to create an example. I would consider that your job anyway. – Roland Jul 31 '15 at 11:46
-
See also [How should outliers be dealt with in linear regression analysis?](http://stats.stackexchange.com/q/175/17230), [Replacing outliers with mean](http://stats.stackexchange.com/q/78063/17230), [Rigorous definition of an outlier?](http://stats.stackexchange.com/q/7155/17230), & [Is there a simple way of detecting outliers?](http://stats.stackexchange.com/q/37865/17230). If you've remaining doubts after reading these please edit your question to address them specifically. – Scortchi - Reinstate Monica Jul 31 '15 at 11:52
-
Sorry, I should ask if you know example paper like this. Thank You @Roland – tyer Jul 31 '15 at 12:06