0

We are monitoring the performance of a network using several KPI's. We want to detect if there are any anomalies for a KPI X within last hour for which we obtained (usually) 30 values from the last hour. We don't know the population standard deviation BUT we can calculate the standard deviation for the KPI X from day zero (when the monitoring was first started).

My first question is if I calculate the standard deviation for the KPI X from day zero will that be the population's standard deviation ?

And secondly which test is more reliable and appropriate for a situation like this ? I am thinking like t-test is the way to go but sometimes we can 600 values per hour so I am not sure if t-test is suitable for such a large amount of values.

EDIT: A general question exists here but I would like to get your opinions about this situation specifically.

Cemre
  • 121
  • 6
  • I saw that question (which suggests t-test) but i would like to get your opinion for this situtation specificly – Cemre Nov 05 '15 at 08:38
  • 1
    Some quick thoughts: there might be a substantial amount of autocorrelation in your data. In this case the t-test (and z-test) is inappropiate, since they assume independent observations. In your case I would not even be surprised if the #samples/hour is also correlated with your KPIs, which can make things even more complicated. Finally, taking day zero KPI standard deviations is almost certainly wrong, since it makes the assumption that the standard deviation remains stable over time. – Erik Nov 05 '15 at 08:39
  • @Tim Not a duplicate in my opinion, as the situation is more specific. – Erik Nov 05 '15 at 08:42
  • @Erik Thanks for your comment. What is your suggestion for conducting a statistical anomaly test in this case ? I also read about Grubbs' test so maybe just use that to find outliers ? – Cemre Nov 05 '15 at 08:45
  • 1
    @Cemre t-test has no problems with big samples (t- converges to normal as sample grows). Look at the thread I linked to, it provides a clear answer "always use the t-test if you don't know the population standard deviation a-priori". Other thing, as Erik noticed, is if you should really use either t-test or z-test in here, or rather other method, but this seems to be a different question. – Tim Nov 05 '15 at 08:49
  • @Cemre I would not recommend much until I exactly understand your goals, i.e. do you want to detect the KPI falling below a threshold (no statistic test needed in my opinion) or early detection of a downward trend... To be honest, I suppose it is one of the questions where I would spend a couple of hours of consultation before giving any recommendations, so I probably won't answer it. So I would just say take a look at process control literature and maybe time series analysis. Try to find a very similar problem. – Erik Nov 05 '15 at 08:52

0 Answers0