I am trying to implement an outliner detection using zscore calculation from scipy.stats in python.
I was thinking a border around the data with 2 standard deviations should be fine to detect outliners. But it is not that easy.
I.e. when I have the following data I get the following zscores.
import scipy.stats as stats
test = [135.77,135.77,135.77,135.77,135.77,135.77,135.77,135.77,135.77,135.78]
print(stats.zscore(test))
[-0.33333333 -0.33333333 -0.33333333 -0.33333333 -0.33333333 -0.33333333
-0.33333333 -0.33333333 -0.33333333 3. ]
Please note the 3 for the last value which is just 0.01 higher because the previous values are exactly the same that is the result of zscore.
On the other hand for the following values contain extrem outlines but are below 3.
test = [135.0, 135.86, 135.5, 134.96, 135.5, 135.68, 135.41, 134.96, 135.68, 135.68,
0.0, 135.77, 135.05, 135.32, 135.68, 135.77, 135.05, 135.86, 0.0, 0.0]
print(stats.zscore(test))
[ 0.41067506 0.42845544 0.42101249 0.40984807 0.42101249 0.42473397
0.41915175 0.40984807 0.42473397 0.42473397 -2.3804309 0.4265947
0.4117088 0.41729102 0.42473397 0.4265947 0.4117088 0.42845544
-2.3804309 -2.3804309 ]
Any ideas on how to detect outliner using zscore in a reliable way?