2

I have heard that takes mean of means or averages of averages is bad, however I don't exactly understand why.

For example, in my case I have mean sales per store for 10 stores. Why is it bad for me to take the mean of the mean sales per store (creating the mean store sales for the entire company)?

Anton
  • 253
  • 1
  • 7
  • 2
    This question is already answered [here](https://math.stackexchange.com/questions/95909/why-is-an-average-of-an-average-usually-incorrect) – Atinesh Jan 18 '18 at 11:53

3 Answers3

5

The mean of means is not necessarily equal to the mean of the total population. E.g. if we have 2 subsets in the population one of size 5 (taking values 1,2,3,4,5) and one of size 2 (taking values 1,2), we would have the means $m_1$ of the first population subset and $m_2$ of the second subset.

$m_1 = (1+2+3+4+5)/5 = 3$

$m_2 = (1+2)/2 = 3/2$

Taking the mean of the means

$M = (3+3/2)/2 = 9/4$

however the total population mean is

$M_{pop} = (1+1+2+2+3+4+5)/7 = 18/7$.

Clearly $M \neq M_{pop}$, this is the case since we didn't take into account the sizes of the subpopulations. This can also happen in a more practical setting if these sizes are unknown and therefore it can be dangerous to take the mean of means. However if you are in luck and know the sizes of the subpopulations then a weighted average could solve this discrepancy.

$M_{weighted} = \frac{5}{7}m_1+\frac{2}{7} m_2= 18/7$

So taking into account the right weights leads to $M_{weighted}= M_{pop}$.

user118591
  • 138
  • 8
  • This problem is no problem. All that it implies that often (indeed always) you need to make sure that the numerator (total) and denominator (#values) of your mean make sense. Sometimes that means weights. (In fact, always it means weights, except we don't worry if weights are identical.) – Nick Cox Jan 18 '18 at 15:38
  • 1
    Indeed that is what I'm trying to point out with my answer. However if you only have information about the averages and none about the population sizes taking the mean of means could lead to an incorrect answer. – user118591 Jan 18 '18 at 15:41
  • Agreed; so I think you can improve your answer to make that point explicit. – Nick Cox Jan 18 '18 at 15:54
  • I updated my answer, hopefully everything is more clear now. – user118591 Jan 18 '18 at 16:02
  • 1
    Should start "is not _necessarily_ equal" as equality is entirely possible. – Nick Cox Jan 18 '18 at 16:13
3

I'll also point out that you run the danger of encountering Simpson's paradox when taking the mean of means rather than the overall mean. In general, this paradox describes a situation where a trend is observed in individual groups, but disappears or even reverses when the groups are combined.

Take, for example, the batting averages of two baseball players (A and B) over two baseball seasons (1 and 2). It's possible that player A has a higher batting average than player B in each season individually, but that if you look at the overall average over both seasons, player B has the higher batting average. This can occur because the mean of means doesn't properly weight the individual seasons' contributions by the number of at-bats, and instead treats them as equivalent.

Nuclear Hoagie
  • 5,553
  • 16
  • 24
2

Suppose you have subgroups $i=1,...,m$ each with sample mean $\bar{x}_i$ taken over $n_i$ data points. Then the sample mean for the whole sample (pooling the subgroups) is:

$$\begin{align} \bar{x} &= \frac{n_1 \bar{x}_1 + \cdots + n_m \bar{x}_m}{n_1 + \cdots + n_m} \\[6pt] &= \frac{1}{n_1 + \cdots + n_m} \sum_{i=1}^m n_i \bar{x}_i. \\[6pt] \end{align}$$

So as you can see, the actual mean of the whole sample is a weighted average of the sample means of the subgroups (where each weight is proportionate to the size of the subgroup). Consequently, if you take a set of subgroups of unequal size, and you take the straight (unweighted) mean of their means, this will not generally be equal to the sample mean of the whole sample.

epp
  • 2,372
  • 2
  • 12
  • 31
Ben
  • 91,027
  • 3
  • 150
  • 376