2

I'm currently working on anomaly detection on time series and one of the discords I'm trying to detect are 'mean-shifts,' i.e. the signal suddenly shifting by a certain value while retaining its overall shape and motifs (A noisy Heaviside step function being the most simple example).

I've just discovered the Matrix Profile and am trying to see if it can be exploited to solve the problem since it can detect regime changes with the CAC curve. One problem I have however is that the z-normalization of the MP "erases" this regime change since the motif is the same before and after the changepoint. I've tried to work with an euclidean distance function but it doesn't mesh well with any non-stationary time series : (As all sub sequences have very few close neighbors..) Series and CAC curve

I was wondering if there is any way to circumvent this (perhaps by introducing a custom distance function?) or whether the MP is just not adapted to this problem?

staalgebre
  • 21
  • 2
  • Maybe you should explain/link to *Matrix Profile*, might not be known by many ... – kjetil b halvorsen Jun 22 '21 at 00:49
  • Hi @staalgebre. The name of what you are looking to do is changepoint detection. There are many methods which work for non-stationary time series. I would point you towards the paper "Most Recent Changepoint Detection in Panel Data" https://arxiv.org/pdf/1609.06805.pdf, https://cran.r-project.org/web/packages/changepoint.mv/changepoint.mv.pdf which has a R implementation. If your dimensions is fairly small it should work well. – David Veitch Jun 23 '21 at 15:04

1 Answers1

2

Hello (inventor of the MP here).

You can "cripple" the z-normalization in the MP . However.. If you want to find meanshifts, there are much easer ways.. For example...

data=zscore(cumsum(randn(2^12,1 ))); % make a z-normailzed random walk data(1:2000)= data(1:2000)+2; %add a mean shift plot( abs( movmean(data,20)-circshift(movmean(data,20),20) )); %peak at shift

  • I have tried similar approaches at first but the data I work with is a bit too 'unstable' for it to work on small shifts, which is why I thought a more 'global' approach might be more suited (I am working purely offline). Regarding the normalization crippling, I have tried to replace the distance with a normal euclidian one, and it gave some pretty good results on my stationary data, but it struggles with data exhibiting any sort of trend. Would something like a 'semi'-normalization work? (As in, setting the means to zero but keeping the variances as they are) – staalgebre Jul 12 '21 at 06:45