0

We can easily find in many online sources, including here in Cross Validated, the claim that outliers shouldn't be removed from data (at least without a good reason). In the thread below, for example, this is the general advice given by most users:

Is it OK to remove outliers from data?

I'd like to quote this claim from a trustworthy source, but I couldn't find such statement in any introductory statistics books.

Is this belief just general wisdom of the crowd or there is something to support it?

EDIT (Justification that this question is not duplicated)

Please, remark that this question is asking for academic references (e.g. the name of a book or a paper) for the claim (posts from CV are no enough for my current needs). Something that could be quoted in a business report or in an academic paper. The question is not concerned with the problem of removing outliers being a good practice or not, as previous questions here in CV.

Possible Answer

The user davefournier suggested in the comments the book:

Robust Statistics, 2nd Edition Peter J. Huber, Elvezio M. Ronchetti ISBN: 978-0-470-12990-6

I just glimpsed at it and it seems promising. I still need to read it a little more to be sure if directly answers the question.

Saul Berardo
  • 664
  • 4
  • 9
  • 2
    This differs only from other general threads about outliers by a meta flavour, i.e. it seems to be asking why anyone should trust the general advice given here. It's hard to answer that without tautology or opening a debate about standards here. The point about CV is not that there are absolute guarantees about trustworthiness (there aren't) but that mechanisms of voting, editing and commentary should not leave poor answers in place and unchallenged. – Nick Cox Dec 14 '16 at 15:28
  • To comment on an orthogonal dimension: there are some excellent introductory texts but also many very poor ones just cloned from yet others by people charged with teaching; also, introductory texts don't usually claim to impart skills for the complexities and challenges of real, practical data analysis. You would not (should not!) take (e.g.) the first text in any degree course as codifying good general professional practice. – Nick Cox Dec 14 '16 at 15:32
  • Sorry if I wasn't clear enough. I didn't want to ask "why anyone should trust the general advice given here". I just want a reference for this claim because I need it to put in report, i. e., I need the name of a book or the title of a paper. For this reason, one of the two tags of this question is "references". – Saul Berardo Dec 14 '16 at 15:49
  • 1
    If you want support for the idea that robust estimators are preferable to removing outliers see the beginning of chapter 1 in this book. Robust Statistics, 2nd Edition Peter J. Huber, Elvezio M. Ronchetti ISBN: 978-0-470-12990-6 – dave fournier Dec 14 '16 at 15:50
  • 2
    Because the related thread (a) provides well-reasoned argument, (b) quotes facts, and (c) supplies references, it is difficult to justify its conclusions as "wisdom of the crowd" and even harder to justify keeping the present thread open. – whuber Dec 14 '16 at 16:11
  • 3
    I'll give the fairly obvious answer: if you want to tamper with your data, it's your responsibility to explain why the tampering is valid, not the other party's responsibility to explain why it's invalid. –  Dec 14 '16 at 16:12
  • Thanks @davefournier, I'll take a look at this book. It seems a good reference. – Saul Berardo Dec 14 '16 at 16:52
  • Comment on edit: There are many threads on outliers here and they include several references. If anything, we would benefit from unification of existing threads. Asking for a reference doesn't justify this as a new thread in my view, especially when there are so many. – Nick Cox Dec 14 '16 at 17:20
  • I don't see a citation to a peer reviewed scientific publication that could be used in a new submission to a journal in the linked thread. I'm voting to reopen. – gung - Reinstate Monica Dec 14 '16 at 17:37
  • 1
    @gung In addition to Box, Hunter, & Hunter, answers and comments in the duplicate thread quote a half dozen sources, several of which look like peer-reviewed articles. – whuber Dec 14 '16 at 18:04
  • 3
    @whuber, I see a comment w/ Box et al, but the comment states that they claim outliers have led to new patents. That isn't (quite) what the OP is asking for. Rereading the thread & comments, I continue to not find a citation to a peer reviewed scientific publication that states outliers should not be removed. – gung - Reinstate Monica Dec 14 '16 at 18:08
  • @Tim We went round that circle about a day ago. I was for closing too, but this has just been re-opened. I suggest leaving as it is and ignoring it if you don't like it. – Nick Cox Dec 15 '16 at 13:30
  • @NickCox I didn't notice, then I'm retracting my vote. – Tim Dec 15 '16 at 13:33

0 Answers0