What mean should I use when few big values?

Question

for a project I need to calculate several means, I then compare each of these means to a distribution under the null hypothesis H0. However sometimes some values (few) are much larger than all the other values, so if I use a basic arithmetic mean these values will greatly impact the final mean.

Here is an example

Here if I do an arithmetic mean I get about 90, while in reality we can see that the vast majority of values on the axis are around 50.

Do you have an idea of an average that would be used in this kind of case?

score 0 · Answer 1 · answered Jan 23 '21 at 12:34

If you want to put a lot of emphasis on getting a robust estimate of central tendency, the median may be for you. That doesn't imply that the mean is invalid; it's just that the mean gets a lot of its weight from extreme values, making it not representative of the whole. You might consider emphasizing the median and its nonparametric confidence interval, but also presenting the mean. Even better present the entire distribution by plotting the empirical cumulative distribution function (ECDF).

You presented the data in a non-traditional way. Reverse the x and y axes. Then superimpose the ECDF.

What mean should I use when few big values?

1 Answers1