Hidden Markov models and anomaly detection

Question

In Shane's answer to this question he suggests that Hidden Markov Models can be used more successfully than wavelets for anomaly / change detection (it was a bit unclear -the topic he was addressing is anomaly detection, although he uses the words "change detection")

I am not very familiar with Hidden Markov Models, but as I understand it, they require a known Markov process (all states and transition probabilities known) and for each state a known set of emission probabilities. The really interesting thing that can be done with these is that given a sequence of emissions one can find the most likely sequence of states that would have led to those emissions.

Assuming that I am understanding HMM correctly (please correct me if I am wrong), how is this used for anomaly detection? How would one determine the underlying Markov process to use and the emission probabilities to use an HMM for anomaly detection?

In editing your post I replaced the word "emission" with "transition" because the term "emission probability" is not commonly used to describe Markov chains and I thought that you were probably referring to transition probabilities. If that is wrong please make appropriate changes. — Michael R. Chernick, Aug 29 '12 at 10:41
Emission probabilities are used in **hidden** Markov models and refer to the probability of an observation given a hidden state. Generally you have hidden state $X_i$ which is related to $X_{i-1}$ by **transition** probabilities and an observation $Y_i$ which is related to $X_i$ by **emission** probabilities. The $X_i$ are never observed and need to be inferred using Vitterbi algorithm or expectation maximisation if the transition probabilities also unknown. — tristan, Aug 29 '12 at 12:42
Your understanding seems accurate to me. Anomaly detection is a bit too vague a term to answer the question accurately, could you give a concrete example of the data and the type of anomaly you want to detect? e.g., a single observation anomaly, a change-point in the system behaviour, ... Also the specific algorithm usually used to infer the unknown parameters of a HMM is the *Baum-Welch algorithm*. — tristan, Aug 29 '12 at 13:11
I successfully used the recursive CUSUM test (a generalized fluctuation test, see Kuan & Hornik 1995) which is also implemented in an R-package http://cran.r-project.org/web/packages/strucchange/index.html. Still, HMMs can be used to detect changes in time-series in a similar way (by looking at most likely sequence of states corresponding to the data). — thias, Aug 31 '12 at 11:50

score 1 · Answer 1 · answered Feb 02 '15 at 05:37

According to this Wikipedia article, there are many inference benefits you gain from a HMM. The first inference is the ability to assign a probability to any observation sequence $\mathbf{Y} = (Y_1,\ldots, Y_N)$ by marginalizing over the set of all possible hidden state sequences $\mathbf{X} = (X_1,\ldots, X_N)$:

$P(\mathbf{Y}) = \sum_{\mathbf{X}} P(\mathbf{X}) P( \mathbf{Y} \vert \mathbf{X} )$

This way, you can assign probabilities to observation sequences even in an on-line manner as observations arrive (using the very efficient forward algorithm). An anomaly is an observation that is (relatively) highly unlikely according $P(\mathbf{Y})$ (a threshold can be used to decide). Of course, the value of $P(\mathbf{Y})$ grows smaller and smaller as $N$ increases. Many methods can be used to renormalize $P(\mathbf{Y})$ to keep it within the representable range of floating-point data types and enable meaningful thresholding. For example, we might use the following as an anomaly measure:

$\mathbb{A}_{N} = \log P(Y_N \vert Y_1,\ldots, Y_{N-1}) = \log \frac{P(Y_1,\ldots, Y_N)}{P(Y_1,\ldots, Y_{N-1})}$ $\mathbb{A}_{N} = \log P(Y_1,\ldots, Y_N) - \log P(Y_1,\ldots, Y_{N-1})$

Can you point the source of such renormalization techniques? It is not clear to me why this difference should correspond to anomaly. — Yury, Feb 17 '16 at 18:33

Hidden Markov models and anomaly detection

1 Answers1