Questions tagged [outliers]

An outlier is an observation that appears to be unusual or not well described relative to a simple characterization of a dataset. A discomfiting possibility is that these data come from a different population than the one intended to be studied.

An outlier is an observation that appears to be unusual or not well described relative to a simple characterization of a dataset. A discomfiting possibility is that these data come from a different population than the one intended to be studied.

However, outliers are not necessarily bad or wrong, nor do they necessarily need to be removed from data for further analysis of that data set. However, outliers (of which there can be more than one in any set of data) indicate that some data at least appear to differ from the bulk of the data set, suggesting they should be individually examined and understood. Also, some statistical procedures are sensitive to outliers: this means that removal of one or more outliers could substantially change the conclusions of those procedures.

1220 questions
104
votes
13 answers

Simple algorithm for online outlier detection of a generic time series

I am working with a large amount of time series. These time series are basically network measurements coming every 10 minutes, and some of them are periodic (i.e. the bandwidth), while some other aren't (i.e. the amount of routing traffic). I would…
gianluca
  • 1,921
  • 4
  • 16
  • 9
99
votes
1 answer

Interpreting plot.lm()

I had a question about interpreting the graphs generated by plot(lm) in R. I was wondering if you guys could tell me how to interpret the scale-location and leverage-residual plots? Any comments would be appreciated. Assume basic knowledge of…
Guest
  • 991
  • 2
  • 7
  • 3
98
votes
13 answers

What is the best way to identify outliers in multivariate data?

Suppose I have a large set of multivariate data with at least three variables. How can I find the outliers? Pairwise scatterplots won't work as it is possible for an outlier to exist in 3 dimensions that is not an outlier in any of the 2 dimensional…
Rob Hyndman
  • 51,928
  • 23
  • 126
  • 178
94
votes
6 answers

Essential data checking tests

In my job role I often work with other people's datasets, non-experts bring me clinical data and I help them to summarise it and perform statistical tests. The problem I am having is that the datasets I am brought are almost always riddled with…
Chris Beeley
  • 5,465
  • 5
  • 36
  • 40
90
votes
15 answers

What do you call an average that does not include outliers?

What do you call an average that does not include outliers? For example if you have a set: {90,89,92,91,5} avg = 73.4 but excluding the outlier (5) we have {90,89,92,91(,5)} avg = 90.5 How do you describe this average in statistics?
Tawani
  • 1,003
  • 1
  • 7
  • 5
89
votes
10 answers

How should outliers be dealt with in linear regression analysis?

Often times a statistical analyst is handed a set dataset and asked to fit a model using a technique such as linear regression. Very frequently the dataset is accompanied with a disclaimer similar to "Oh yeah, we messed up collecting some of these…
Sharpie
  • 4,126
  • 5
  • 21
  • 18
85
votes
14 answers

Why haven't robust (and resistant) statistics replaced classical techniques?

When solving business problems using data, it's common that at least one key assumption that under-pins classical statistics is invalid. Most of the time, no one bothers to check those assumptions so you never actually know. For instance, that so…
doug
  • 9,901
  • 1
  • 22
  • 26
53
votes
4 answers

Fast linear regression robust to outliers

I am dealing with linear data with outliers, some of which are at more the 5 standard deviations away from the estimated regression line. I'm looking for a linear regression technique that reduces the influence of these points. So far what I did is…
Matteo Fasiolo
  • 3,134
  • 2
  • 20
  • 29
46
votes
3 answers

How are Random Forests not sensitive to outliers?

I've read in a few sources, including this one, that Random Forests are not sensitive to outliers (in the way that Logistic Regression and other ML methods are, for example). However, two pieces of intuition tell me otherwise: Whenever a decision…
makansij
  • 1,919
  • 5
  • 27
  • 38
45
votes
8 answers

Rigorous definition of an outlier?

People often talk about dealing with outliers in statistics. The thing that bothers me about this is that, as far as I can tell, the definition of an outlier is completely subjective. For example, if the true distribution of some random variable…
dsimcha
  • 7,375
  • 7
  • 32
  • 29
40
votes
1 answer

Detecting Outliers in Time Series (LS/AO/TC) using tsoutliers package in R. How to represent outliers in equation format?

Comments: Firstly I would like to say a big thank you to the author of the new tsoutliers package which implements Chen and Liu's time series outlier detection which was published in the Journal of the American Statistical Association in 1993 in…
forecaster
  • 7,349
  • 9
  • 43
  • 81
39
votes
1 answer

Link Anomaly Detection in Temporal Network

I came across this paper that uses link anomaly detection to predict trending topics, and I found it incredibly intriguing: The paper is "Discovering Emerging Topics in Social Streams via Link Anomaly Detection". I would love to replicate it on a…
Olga Mu
  • 705
  • 1
  • 5
  • 12
38
votes
8 answers

Is it OK to remove outliers from data?

I looked for a way to remove outliers from a dataset and I found this question. In some of the comments and answers to this question, however, people mentioned that it is bad practice to remove outliers from the data. In my dataset I have several…
Sininho
  • 501
  • 1
  • 4
  • 7
34
votes
4 answers

Why isn't RANSAC most widely used in statistics?

Coming from the field of computer vision, I've often used the RANSAC (Random Sample Consensus) method for fitting models to data with lots of outliers. However, I've never seen it used by statisticians, and I've always been under the impression…
Bossykena
  • 667
  • 6
  • 11
33
votes
8 answers

Replacing outliers with mean

This question was asked by my friend who is not internet savvy. I've no statistics background and I've been searching around internet for this question. The question is : is it possible to replace outliers with mean value? if it's possible, is…
Alun
  • 433
  • 1
  • 4
  • 5
1
2 3
81 82