7

In Shane's answer to this question he suggests that Hidden Markov Models can be used more successfully than wavelets for anomaly / change detection (it was a bit unclear -the topic he was addressing is anomaly detection, although he uses the words "change detection")

I am not very familiar with Hidden Markov Models, but as I understand it, they require a known Markov process (all states and transition probabilities known) and for each state a known set of emission probabilities. The really interesting thing that can be done with these is that given a sequence of emissions one can find the most likely sequence of states that would have led to those emissions.

Assuming that I am understanding HMM correctly (please correct me if I am wrong), how is this used for anomaly detection? How would one determine the underlying Markov process to use and the emission probabilities to use an HMM for anomaly detection?

John Robertson
  • 973
  • 3
  • 15
  • 25
  • 1
    In editing your post I replaced the word "emission" with "transition" because the term "emission probability" is not commonly used to describe Markov chains and I thought that you were probably referring to transition probabilities. If that is wrong please make appropriate changes. – Michael R. Chernick Aug 29 '12 at 10:41
  • 7
    Emission probabilities are used in **hidden** Markov models and refer to the probability of an observation given a hidden state. Generally you have hidden state $X_i$ which is related to $X_{i-1}$ by **transition** probabilities and an observation $Y_i$ which is related to $X_i$ by **emission** probabilities. The $X_i$ are never observed and need to be inferred using Vitterbi algorithm or expectation maximisation if the transition probabilities also unknown. – tristan Aug 29 '12 at 12:42
  • Your understanding seems accurate to me. Anomaly detection is a bit too vague a term to answer the question accurately, could you give a concrete example of the data and the type of anomaly you want to detect? e.g., a single observation anomaly, a change-point in the system behaviour, ... Also the specific algorithm usually used to infer the unknown parameters of a HMM is the *Baum-Welch algorithm*. – tristan Aug 29 '12 at 13:11
  • Then I shall fix my edit. – Michael R. Chernick Aug 29 '12 at 13:57
  • I successfully used the recursive CUSUM test (a generalized fluctuation test, see Kuan & Hornik 1995) which is also implemented in an R-package http://cran.r-project.org/web/packages/strucchange/index.html. Still, HMMs can be used to detect changes in time-series in a similar way (by looking at most likely sequence of states corresponding to the data). – thias Aug 31 '12 at 11:50

1 Answers1

1

According to this Wikipedia article, there are many inference benefits you gain from a HMM. The first inference is the ability to assign a probability to any observation sequence $\mathbf{Y} = (Y_1,\ldots, Y_N)$ by marginalizing over the set of all possible hidden state sequences $\mathbf{X} = (X_1,\ldots, X_N)$:

$P(\mathbf{Y}) = \sum_{\mathbf{X}} P(\mathbf{X}) P( \mathbf{Y} \vert \mathbf{X} )$

This way, you can assign probabilities to observation sequences even in an on-line manner as observations arrive (using the very efficient forward algorithm). An anomaly is an observation that is (relatively) highly unlikely according $P(\mathbf{Y})$ (a threshold can be used to decide). Of course, the value of $P(\mathbf{Y})$ grows smaller and smaller as $N$ increases. Many methods can be used to renormalize $P(\mathbf{Y})$ to keep it within the representable range of floating-point data types and enable meaningful thresholding. For example, we might use the following as an anomaly measure:

$\mathbb{A}_{N} = \log P(Y_N \vert Y_1,\ldots, Y_{N-1}) = \log \frac{P(Y_1,\ldots, Y_N)}{P(Y_1,\ldots, Y_{N-1})}$ $\mathbb{A}_{N} = \log P(Y_1,\ldots, Y_N) - \log P(Y_1,\ldots, Y_{N-1})$

Ahmed Nassar
  • 111
  • 4
  • Can you point the source of such renormalization techniques? It is not clear to me why this difference should correspond to anomaly. – Yury Feb 17 '16 at 18:33