0

for a project I need to calculate several means, I then compare each of these means to a distribution under the null hypothesis H0. However sometimes some values (few) are much larger than all the other values, so if I use a basic arithmetic mean these values will greatly impact the final mean.

Here is an example

enter image description here

Here if I do an arithmetic mean I get about 90, while in reality we can see that the vast majority of values on the axis are around 50.

Do you have an idea of an average that would be used in this kind of case?

1 Answers1

0

If you want to put a lot of emphasis on getting a robust estimate of central tendency, the median may be for you. That doesn't imply that the mean is invalid; it's just that the mean gets a lot of its weight from extreme values, making it not representative of the whole. You might consider emphasizing the median and its nonparametric confidence interval, but also presenting the mean. Even better present the entire distribution by plotting the empirical cumulative distribution function (ECDF).

You presented the data in a non-traditional way. Reverse the x and y axes. Then superimpose the ECDF.

Frank Harrell
  • 74,029
  • 5
  • 148
  • 322