3

I have been asked to do some real time data analysis. The data values represent parameters of phone calls for a telco (like number of calls, call length, etc.) If the numbers suddenly drop or spike, I'm supposed to raise an alarm. Normal noise is expected, but we want to know if things change suddenly, like if no one's routing calls over a line. So individual outliers are not an issue, but if the whole set changes significantly, we want to know.

I can handle the programming, but I'm not sure what statistical quantities I should be checking.

My first guess was to compute a moving average (of number of calls per minute, for example) and a moving standard deviation, and trigger an alarm if the average changes more than one standard deviation. But looking over this site, I think that might be terribly naïve. What would be a good way to detect these changes?

Ron Romero
  • 133
  • 5

2 Answers2

3

You are quite right your well-intentioned suggestion is quite naive. I will try to point out the deficiency while suggesting an approach. A simple moving-average is inadequate because you are assuming how many values to use and the weights to apply to each value. An ARIMA model is a super-set of he simple moving average because it determines how many values to use and precisely what the weights are for each value. Secondly the standard deviation that you refer to is inadequate because it assumes the expected value is equal to the mean and that there are no unusual data points. What is correct is the standard deviation of the residuals from a model that fully characterizes the historical data. What you are trying to do is to compute the probability of observing the most recent value BEFORE you observed it. This requires isolating historical pulses, level shifts , seasonal pulses and/or local time trends while ensuring that the parameters didn't change over time and the variance of the errors has been made to be constant (if necessary ). When you reflect that you are concerned with major changes not small changes you are simply saying "if it is a pulse or a level shift , ignore impacts/coefficients that are less than some user-specified value. You might look at Intervention Detection concepts How to detect a significant change in time series data due to a "policy" change? and perhaps googling "automatic intervention detection"

IrishStat
  • 27,906
  • 5
  • 29
  • 55
1

You may use Numenta Anomaly Benchmark.

From the project's description:

NAB is a novel benchmark for evaluating algorithms for anomaly detection in streaming, real-time applications. It is comprised of over 50 labeled real-world and artificial timeseries data files plus a novel scoring mechanism designed for real-time applications.

Prokhozhii
  • 111
  • 1