I have a dataset which basically is measuring the number of people passing a certain region which is monitored and I basically have these raw counts of people over the last two months at 5 minutes interval (i.e. they counted the number of people for 30 seconds every 5 minutes).
Now, there is some seasonality in the data
- The footfall is high at certain times of the day (when people go to work or leave work for example or lunch hour).
- The footfall is low on the weekend
Now what I am interested in knowing is anomalies in this dataset i.e. when the footfall is unusually high or low. My issue is that the training data itself carries these anomalies and it is not labelled.
So, my question is is there a way to sort of describe (for a given time of the day) what the normal footfall range might look like. So, for example, I can take something like the mean for a given time window and compute the standard deviation and say that any value beyond whatever scale of the standard deviation is an anomaly.
But beyond this simple analysis, is there something else I can try?