1

I am looking at a set of normal distributed data and trying to figure out why my intuition is wrong here.

If I have multiple normal distributions $N_m(M_m, S_m)$ the literature tells me that I can join them all into a single distribution $N(M, S)$ such as:

$$ M = \sum_m M_m / m \\ S = \sum_m S_m^2 / m^2 $$

And if each has a weight $W_m$: (such as $\sum_m W_m = 1$)

$$ M = \sum_m W_m M_m \\ S = {\sum_m W_m^2 S_m^2 } $$

These formulas above are wrong for the operation I am trying to achieve! (This I know now that Tim below answered my question). I am leaving them here in case somebody finds this query and is having a similar issue.

What could the data be? Lets say I have 5 classes with grade distribution and want to aggregate them into a single one.

So if I have:

N1(0, 3)
N2(0, 3)
N3(0, 3)
N4(0, 3)
N5(0, 3)

(yeah they are bad! and even have negative grades)

My aggregated $N(M,S)$ is:

$$ M = 0 $$

and

$$ S = {(0.2^2 \times 3^2) \times 5} = 1.8 $$

And here is where my intuition fails. How can my final distribution be $N(0, 1.8)$. Shouldn't I have $N(0, 3)$ as they are all the same?

Also, notice that with weights I will get the same result unless I make one of them 99% and the rest a small % each. So that makes me wonder if the formulas are use are correct but the result is not an aggregated distribution to represent the other 5 or something else (as in, the distribution of picking a result from each distribution or something like that).

Hope somebody can help me understand the concept behind these results?

Yona
  • 113
  • 3
  • As a sidenote $\sqrt{3^2 \times 5} \ne 1.8$ and $\sum_5 5 \ne 1$ – Tim Dec 13 '16 at 10:50
  • Thanks Tim. Obviously 5 N() will mean a weight of 20% for each. The correct formula is now in (as not to confuse future readers). – Yona Dec 13 '16 at 14:02
  • thanks. My answer still applies, please comment if it is unclear or something needs to be made more precise. – Tim Dec 13 '16 at 14:06

1 Answers1

0

You seem to be talking about mixture distribution. Mixture distribution $f$ is a mixture of several other distributions $f_i$, for example normal distributions parametrized by means and variances $\mu_i$ and $\sigma^2_i$, appearing with mixing proportions $p_i$ (such as $\sum_i p_i = 1$),

$$ f(x) = \sum_i p_i f_i(x; \mu_i, \sigma^2_i) $$

In case of mixture of normal distributions, their combined mean and variance are (Behboodian, 1970)

$$ \mu_\text{comb} = \sum_i p_i \mu_i \\ \sigma^2_\text{comb} = \sum_i p_i (\sigma^2_i + \mu_i^2) - \Big( \sum_i p_i \mu_i \Big)^2 $$

So in your case, the combined variance would be

$$ \sum_i p_i (\sigma^2_i + \mu_i^2) - \Big( \sum_i p_i \mu_i \Big)^2 = \\ \sum_i 1/5 \times (3^2 + 0) - \Big( \sum_i 1/5 \times 0 \Big)^2 = \\ 5 \times 1/5 \times 9 - 0 = 9 $$

What is pretty obvious given the fact that we are talking about "mixture" of identical distributions.


Behboodian, J. (1970). On a mixture of normal distributions. Biometrika, 57(1), 215-217.

Tim
  • 108,699
  • 20
  • 212
  • 390
  • @Yona: were you talking about a mixture distribution, though, or were you talking about the distribution of a sum or mean of several random variables? Because these are not the same thing, and in fact the equations in your question seem to refer to the latter. – Ruben van Bergen Dec 13 '16 at 15:28
  • @RubenvanBergen this is also what I initially thought and previous version of this answer mentioned also this case, but if you read the question carefully, it relates directly to dealing with mixture of distributions rather then dealing with sum of random variables. – Tim Dec 13 '16 at 15:32
  • @RubenvanBergen Tim is correct here: as I was trying to figure out what Tim shows I kept finding the formulas on my question. But I could see that was incorrect as when applied the results were not displaying the behavior I expected. That's the problem with "Googling" for a question, you sometimes may find people have phrased them wrong. That is why I hope this question with the example will help people with a similar query in the future. – Yona Dec 13 '16 at 16:16