Popular methods for outlier detection (right skewed distribution)

Question

What are the popular methods for outlier detection in univariate data, which do not assume normal distribution?

Many, many relevant questions on outliers here: have you consulted several? You need to have a specific new question to have much chance of a detailed answer. In any case, I wouldn't put much faith in popularity of methods any more than I judge newspapers by their circulation. Dropping presumed outliers just because they appear inconvenient is, I guess, one of the most popular methods, but arguably the worst of all. — Nick Cox, Jan 30 '15 at 15:01
One possibility is http://stats.stackexchange.com/questions/78063/replacing-outliers-with-mean/78067#78067 In that thread, as may happen here, the answers were wider than the question. — Nick Cox, Jan 30 '15 at 15:16

Anthony · Answer 1 · 2015-01-30T15:21:58.770

Generally, you should avoid trimming outliers in an ad hoc fashion and instead use nonparametric or robust alternatives. A recent review with Monte Carlo studies can be found in Bakker and Wicherts (2014). At least in psychology journals, Z-score cut-offs were most popular. Of course, I wouldn't recommend that; the simulation studies in the same article demonstrate that Z-score cut-offs can inflate Type I error rates.

Although the review is focused on independent samples t-tests, most of their recommendations will apply more broadly. They concluded with the following recommendations:

• Correct or delete erroneous values.

• Based on prior research, it is not recommended to use Z scores to identify outliers. We recommend methods that suffer less from masking like the IQR or the MAD-median rule instead.

• Decide on outlier handling before seeing the results of the main analyses, and if possible, preregister the study at, for example, the Open Science Framework (http://openscienceframework.org/).

• If preregistration is not possible, report the outcomes both with and without outliers or on the basis of alternative methods.

• Report transparently about how outliers were handled.

• Do not carelessly remove outliers as this increases the probability of finding a false positive, especially when using a threshold value of Z lower than 3 or when the data are skewed.

• Use methods that are less influenced by outliers like nonparametric or robust methods such as the Mann-Whitney-Wilcoxon test and the Yuen-Welch test, or researchers may choose to conduct bootstrapping (all without removing outliers).

References:

Bakker, M., & Wicherts, J. M. (2014). Outlier removal, sum scores, and the inflation of the type I error rate in independent samples t tests: The power of alternatives and recommendations. Psychological Methods, 19(3), 409-427.

This is good advice, although I would greatly widen the list of approaches worth considering. I note that it does not answer the question as it says precisely nothing about popularity. As I consider that a dubious criterion, I am happy to upvote. — Nick Cox, Jan 30 '15 at 15:14
@Nick Cox, good points. I've edited the answer. It now makes clear that the cited article focused mainly on outliers with independent samples t-tests (that's the reason they emphasize Mann-Whitney-Wilcoxon and Yuen-Welch tests.) — Anthony, Jan 30 '15 at 15:18
@NickCox thanks for the link, it is quite useful. I have records of n individuals behavior expressed in seconds. I want to apply a clustering algorithm to identify some behavioral groups, however I found that some individuals did other behavioral activities in the recorded time, and therefore the distribution is rightly skewed. I believe that this can affect my conclusions and the whole analysis. Therefore I'm just trying to find a way how to reduce influence of such records. — user27241, Jan 30 '15 at 15:47
You have a more specific question then. So either edit this question or (better, I think) start a new thread. However, I would advise expanding this outline greatly. — Nick Cox, Jan 30 '15 at 15:51

Popular methods for outlier detection (right skewed distribution)

1 Answers1