Anomaly detection refers to the problem of finding patterns in data that do not conform to expected behavior. This is also known as outlier detection.
Questions tagged [anomaly-detection]
410 questions
79
votes
9 answers
What algorithm should I use to detect anomalies on time-series?
Background
I'm working in Network Operations Center, we monitor computer systems and their performance. One of the key metrics to monitor is a number of visitors\customers currently connected to our servers. To make it visible we (Ops team) collect…

Ilya Khadykin
- 891
- 1
- 7
- 6
24
votes
5 answers
Algorithms for Time Series Anomaly Detection
I'm currently using Twitter's AnomalyDetection in R: https://github.com/twitter/AnomalyDetection. This algorithm provides time series anomaly detection for data with seasonality.
Question: are there any other algorithms similar to this (controlling…

Eric Miller
- 441
- 2
- 5
- 6
19
votes
2 answers
Anomaly Detection with Dummy Features (and other Discrete/Categorical Features)
tl;dr
What is the recommended way to deal with discrete data when performing anomaly detection?
What is the recommended way to deal with categorical data when performing anomaly detection?
This answer suggests using discrete data to just filter the…

Adrian Torrie
- 293
- 3
- 8
19
votes
1 answer
Robust PCA vs. robust Mahalanobis distance for outlier detection
Robust PCA (as developed by Candes et al 2009 or better yet Netrepalli et al 2014) is a popular method for multivariate outlier detection, but Mahalanobis distance can also be used for outlier detection given a robust, regularized estimate of the…

Mustafa Eisa
- 1,302
- 9
- 19
19
votes
7 answers
Difference between Anomaly and Outlier
What is the difference between Outlier and Anomaly in the context of machine learning. My understanding is that both of them refer to the same thing.

user3282512
- 191
- 1
- 1
- 3
15
votes
3 answers
Encoding of categorical variables with high cardinality
For unsupervised anomaly detection / fraud analytics on credit card data (where I don't have labeled fraudulent cases), there are a lot of variables to consider. The data is of mixed type with continuous/numerical variables (e.g. USD amount spent)…

robot_2077198
- 587
- 1
- 6
- 16
13
votes
3 answers
scikit-learn IsolationForest anomaly score
According to IsolationForest papers (refs are given in documentation)
the score produced by Isolation Forest should be between 0 and 1.
The implementation in scikit-learn negates the scores (so high score is more on inlier) and also seems to shift…

DAF
- 167
- 1
- 1
- 10
11
votes
1 answer
Difference between Outlier and Inlier
I have stumbled upon the term inlier in the LOF measure (Local Outlier Factor), I am familiar with the term of outliers (well basically liers - instances which doesn't behave as the rest of the instances).
What does 'Inliers' mean in the context of…

Anton.P
- 329
- 3
- 5
10
votes
3 answers
Simple outlier detection for time series
I wanted to generate a very simple example of anomaly detection for time series. So I created sample data with one very obvious outlier. Here's a picture of the data:
The problem is, I didn't get any method to detect the outlier reliably so far. I…

Marcus Wenzel
- 103
- 1
- 1
- 5
10
votes
4 answers
Feature Importance in Isolation Forest
In an unsupervised setting for higher-dimensional data (e.g. 10 variables (numerical and categorical), 5000 samples, ratio of anomalies likely 1% or below but unknown) I am able to fit the isolation forest and retrieve computed anomaly scores…

robot_2077198
- 587
- 1
- 6
- 16
10
votes
1 answer
Are time series motifs and the Matrix profile algorithm a good fit for my problem?
I have huge multivariate time series to analyze (Terabytes of data) and I need fast, scalable algorithms for mainly two tasks:
finding similar patterns among time series. For example, imagine I identify a certain pattern in a reference time series.…

DeltaIV
- 15,894
- 4
- 62
- 104
10
votes
3 answers
Time Series Anomaly Detection with Python
I need to implement anomaly detection on several time-series datasets. I've never done this before and was hoping for some advice. I'm very comfortable with python, so I would prefer the solution be implemented in it (most of my code is python for…

Eric Miller
- 441
- 2
- 5
- 6
9
votes
3 answers
Can Negative Binomial parameters be treated like Poisson?
I have a count process that I'd like to model with a Poisson process. Data is measured every 30 minutes, and with a poisson distribution I can easily measure the probability of a given count of events being anomalous in different time periods using…

J Doe
- 352
- 1
- 10
9
votes
2 answers
Methods to detect published mistakes without raw data?
I'm interested in ways to detect mistakes in published papers without analyzing the raw data. For example the GRIM test [1]. Here's another similarish one from one of the GRIM authors' blog. I don't know of any others.
Looking for inconsistencies…

R Greg Stacey
- 2,202
- 2
- 15
- 30
9
votes
1 answer
Anomaly detection using PCA reconstruction error
I would like to use PCA as a method of anomaly detection, however I'm wondering how this is done exactly (I'm using prcomp in R).
I'm really questioning the approach not the R code itself.
Am I right in thinking I first run PCA on a bunch of data to…

PaulB.
- 655
- 3
- 6
- 10