0

Let us say I have several KPI measurements per month. There may be different numbers of measurements per month but there are more than 30. The monthly distributions do not look normally as shown in this raincloud plot:

enter image description here

Maybe one can just use a log/box cox transformations and apply standard statistical process charts?

Using the qicharts2 package's and using an xbar chart without data transformation the following chart is produced:

enter image description here

There are breaches of the control limit but I am worried that these can be misleading, as the data is clearly not normal. Mind you having more than 30 samples could the CLT not be invoked?

Could I use transformations or are there other methods/packages I could use to detect such breaches? I am happy to implement something from scratch.

Any pointers very much appreciated. Thanks!

PS:

Looking at this:

Quality control - non-normal distribution (answer by Tavrock). Maybe one could simply determine the 99.5th, 50th and .5th percentiles for the whole data and then determine "outliers" using the upper and lower percentile?

cs0815
  • 1,294
  • 18
  • 30
  • Are the elements of your multivariate observation at each time point linked? I.e. is variable 1 from time 1 measuring the same thing as variable 1 from time 2? – adunaic Jan 08 '21 at 13:32
  • Are you interested in movement of the cloud as a whole or, if linked, the individual KPIs? – adunaic Jan 08 '21 at 13:32
  • each data point is independent. – cs0815 Jan 08 '21 at 13:52
  • 1
    It appears you have plenty of historical data and thus calculating the 95%, 99% percentiles (and other desired pain points) is a sensible strategy for your process. And yes, if you are concerned with longer term trends the several points could be averaged together to take advantage of the CLT. – Dave2e Jan 08 '21 at 14:08
  • @Dave2e one issue I have with using percentiles is that as the number of data points increases, so does the number of data points above percentile threshold. I just came across the Robust Z-score ... – cs0815 Jan 08 '21 at 14:18
  • If your process is stable, each point is independent, and the percentile calculations is based on a large sample of historical data, then the percentage data points outside the limits will remain constant. This is the basis of Statistical process control. If the percentage of data points above percentile threshold increases then your process is unstable and not in control. This is the trigger to investigate why. – Dave2e Jan 08 '21 at 15:04
  • @Dave2e - sure but one cannot simply apply techniques that assume normality .... – cs0815 Jan 08 '21 at 15:43
  • Percentile calculates do not assume any distribution or involve the mean, standard deviation or other characterization for the distributation. The 95% percentile means 95% of the points are below that limit. Yes, for the normal distribution the shortcut calculations is 1.96*sd + mean. In your case you need to sort your values from low to high and find the value where 95% of the points is < X. – Dave2e Jan 08 '21 at 16:33
  • If you are just interested in how the cloud of points moves then you can generate new series monitoring the quantities of interest, e.g., mean, variance, median, 70th quantile,.... then look for changes in those either univariately or multivariately. – adunaic Jan 14 '21 at 15:21

0 Answers0