anomaly detection in time series training data

Question

I have a dataset which basically is measuring the number of people passing a certain region which is monitored and I basically have these raw counts of people over the last two months at 5 minutes interval (i.e. they counted the number of people for 30 seconds every 5 minutes).

Now, there is some seasonality in the data

The footfall is high at certain times of the day (when people go to work or leave work for example or lunch hour).
The footfall is low on the weekend

Now what I am interested in knowing is anomalies in this dataset i.e. when the footfall is unusually high or low. My issue is that the training data itself carries these anomalies and it is not labelled.

So, my question is is there a way to sort of describe (for a given time of the day) what the normal footfall range might look like. So, for example, I can take something like the mean for a given time window and compute the standard deviation and say that any value beyond whatever scale of the standard deviation is an anomaly.

But beyond this simple analysis, is there something else I can try?

score 3 · Accepted Answer · answered Aug 14 '19 at 21:58

Your basic idea is right; to formalize it, I would say you should just "de-trend" the data and then apply anomaly/changepoint detection. So, fit a time series regression model that accounts for the seasonality and other "expected" sources of variation, and then look for anomalies in the model residuals.

IrishStat · Answer 2 · 2019-08-15T12:23:18.407

Generalizing on @Sheridan , identify a model that uses anthropomorphic structures as well as ARIMA structure and use Intervention Detection procedures to identify latent deterministic structure which incluses pulses, level shifts, seasonal pulses and local time trends. Simple method of forecasting number of guests given current and historical data might be helpful as it studies restaurant data.

Trends may or not exist. Level shifts may or may not exist. Seasonal factors may change over time . Ourtliers can't be often detected if there is arima structure present. Hourly patterns for weekends are often different from weekdays .

For longer time series than yours lead and lag effects of holidays often come into play . Certain days of the month may have special patterns. The particular week of the month may be important .....etc .

Thanks for that. In that other post, do you have any R or python code to replicate thosee results? — Luca, Aug 15 '19 at 09:11
No code per se .... But there is a version of AUTOBOX available in R that could be useful.as I have helped to develop . — IrishStat, Aug 15 '19 at 10:15

anomaly detection in time series training data

2 Answers2