60

Say I have two normal distributions A and B with means $\mu_A$ and $\mu_B$ and variances $\sigma_A$ and $\sigma_B$. I want to take a weighted mixture of these two distributions using weights $p$ and $q$ where $0\le p \le 1$ and $q = 1-p$. I know that the mean of this mixture would be $\mu_{AB} = (p\times\mu_A) + (q\times\mu_B)$.

What would the variance be?


A concrete example would be if I knew the parameters for the distribution of male and female height. If I had a room of people that was 60% male, I could produce the expected mean height for the whole room, but what about the variance?

JoFrhwld
  • 2,247
  • 3
  • 20
  • 22
  • Re terminology: The mixture simply has a mean and a variance; there's no sense in qualifying these as "expected," unless you are perhaps hinting that $p$ and $q$ should be considered random variables. – whuber Oct 06 '11 at 16:44
  • I know that the mixture of two gaussian distributions is identificable. But if the two distributions have the same emans? I.e:, is the mixture of two normal distributions with the same means and different standard deviations identificable? There are papers in this context? Thanks in advance –  Nov 14 '13 at 11:39
  • 1
    There is a similar question with answers (dealing also with the COVARIANCES) here: http://math.stackexchange.com/q/195911/96547 – hplieninger Mar 17 '16 at 10:40

2 Answers2

83

The variance is the second moment minus the square of the first moment, so it suffices to compute moments of mixtures.

In general, given distributions with PDFs $f_i$ and constant (non-random) weights $p_i$, the PDF of the mixture is

$$f(x) = \sum_i{p_i f_i(x)},$$

from which it follows immediately for any moment $k$ that

$$\mu^{(k)} = \mathbb{E}_{f}[x^k] = \sum_i{p_i \mathbb{E}_{f_i}[x^k]} = \sum_i{p_i \mu_i^{(k)}}.$$

I have written $\mu^{(k)}$ for the $k^{th}$ moment of $f$ and $\mu_i^{(k)}$ for the $k^{th}$ moment of $f_i$.

Using these formulae, the variance can be written

$$\text{Var}(f) = \mu^{(2)} - \left(\mu^{(1)}\right)^2 = \sum_i{p_i \mu_i^{(2)}} - \left(\sum_i{p_i \mu_i^{(1)}}\right)^2.$$

Equivalently, if the variances of the $f_i$ are given as $\sigma^2_i$, then $\mu^{(2)}_i = \sigma^2_i + \left(\mu^{(1)}_i\right)^2$, enabling the variance of the mixture $f$ to be written in terms of the variances and means of its components as

$$\eqalign{ \text{Var}(f) &= \sum_i{p_i \left(\sigma^2_i + \left(\mu^{(1)}_i\right)^2\right)} - \left(\sum_i{p_i \mu_i^{(1)}}\right)^2 \\ &= \sum_i{p_i \sigma^2_i} + \sum_i{p_i\left(\mu_i^{(1)}\right)^2} - \left(\sum_{i}{p_i \mu_i^{(1)}}\right)^2. }$$

In words, this is the (weighted) average variance plus the average squared mean minus the square of the average mean. Because squaring is a convex function, Jensen's Inequality asserts that the average squared mean can be no less than the square of the average mean. This allows us to understand the formula as stating the variance of the mixture is the mixture of the variances plus a non-negative term accounting for the (weighted) dispersion of the means.

In your case the variance is

$$p_A \sigma_A^2 + p_B \sigma_B^2 + \left[p_A\mu_A^2 + p_B\mu_B^2 - (p_A \mu_A + p_B \mu_B)^2\right].$$

We can interpret this is a weighted mixture of the two variances, $p_A\sigma_A^2 + p_B\sigma_B^2$, plus a (necessarily positive) correction term to account for the shifts from the individual means relative to the overall mixture mean.

The utility of this variance in interpreting data, such as given in the question, is doubtful, because the mixture distribution will not be Normal (and may depart substantially from it, to the extent of exhibiting bimodality).

whuber
  • 281,159
  • 54
  • 637
  • 1,101
  • 11
    In particular, noting that $p_A+p_B=1$, your last expression simplifies to $\sigma^2=\mu^{(2)}-\mu^2=p_A\sigma_A^2+p_B\sigma_B^2+p_Ap_B(\mu_A-\mu_B)^2$. – Ilmari Karonen Jun 20 '13 at 18:46
  • 3
    Or, if we do impose a probabilistic explanation for a mixture density (there is an event $A$ of probabiity $p_A$ and the _conditional_ density of $X$ given $A$ is $N(\mu_A,\sigma_A^2)$ while the _conditional_ density of $X$ given $A^c = B$ is $N(\mu_B,\sigma_B^2)$), then var$(X)$ is the sum of the mean of the conditional variance plus the variance of the conditional mean. The latter is a discrete RV $Y$ with values $\mu_A, \mu_B$ with probabilities $p$ and $q$ and your expression in square brackets is readily recognized to be $E[Y^2]-(E[Y])^2$. – Dilip Sarwate May 20 '15 at 15:25
  • @Dilip That is an excellent and elegant observation--the *Law of Total Variance.* – whuber May 20 '15 at 15:27
  • Perhaps you could incorporate the Law of Total Variance into your answer (I had mentioned this in the [other question that you recently closed](http://stats.stackexchange.com/q/153105/6633))? Especially since your more general formula $\sum_i p_i \left(\mu_i^{(1)}\right)^2$ etc is also readily recognizable as $E[Y^2]-(E[Y])^2$ where $Y$ takes on values $\mu_i^{(1)}$ with probability $p_i$ – Dilip Sarwate May 20 '15 at 15:49
  • I don't understand why the second moment of component i is equal to the variance of the i_th component plus the mean of the i_th component squared: I was expecting the variance minus the mean squared? – Neodyme Feb 09 '16 at 18:33
  • 1
    @Neodyme By definition, the variance is the second moment minus the mean squared. Therefore, the second moment is the variance *plus* the mean squared. – whuber Feb 09 '16 at 21:05
  • Thanks, so if we use $E[(X-\mu)^2]$ we obtain $ E[X^2 + \mu^2 - 2X\mu] $ ... which is indeed the variance plus the mean squared, however with an extra term added. I'm almost there, but I still don't understand what happens to the last term $2X\mu$ ? – Neodyme Feb 10 '16 at 12:31
  • 1
    @Neodyme use $E(X)=\mu$. – whuber Feb 10 '16 at 14:15
  • @whuber Regarding interpretability, the mixture distribution would be normal if all the mean's were 0 correct? – Kiran K. Oct 11 '16 at 18:20
  • 1
    @Kiran Although in some cases the mixture might *look* Normal, it will not be. One way to see that is to compute its excess kurtosis using the formulas given here. It will be nonzero unless all the standard deviations are equal--in which case the "mixture" isn't really a mixture in the first place. – whuber Oct 11 '16 at 18:38
  • @whuber, where you've used $\mu_i^{(1)}$, I wonder if it should be replaced by $\mu_i^{(1)} - \mu^{(1)}$. As it stands now, I believe the formula for the variance of the mixture depends on $\mu^{(1)}$. For example, if $\mu = \mu_1 = \mu_2 = 0$, the second and third term in your final equation for $\text{Var}(f)$ drop out. But if, $\mu = \mu_1 = \mu_2 = 3$, they do not. – rcorty Nov 08 '17 at 19:21
  • @rcorty Thank you for looking so closely. I checked, but cannot find an error of reasoning or algebra. I don't think I follow your argument, since of course the formula must involve $\mu^{(1)}$ and indeed that term is involved (if you expand the sums). – whuber Nov 08 '17 at 20:20
  • @whuber Can this approach be generalized for n normal distributions? – RMMA Aug 20 '18 at 15:50
  • @RMMA It doesn't need to be generalized: it explicitly handles that case already in the first two formulas. – whuber Aug 20 '18 at 15:52
6

The solution of whuber is perfect but it seems that something lacks to join this result with the LTV (law of total variance). The previous result $$\sigma^2=p_A \sigma_A^2+p_B \sigma_B^2+p_A \mu_A^2+p_B \mu_B^2−\mu^2$$

can be rewritten taking into account that $2p_A\mu_A\mu +2p_B\mu_B\mu=2\mu(p_A\mu_A+p_B\mu_B)=2\mu^2$, so

$$\sigma^2=p_A \sigma_A^2+p_B \sigma_B^2+p_A \mu_A^2+p_B \mu_B^2+\mu^2 -2p_A\mu_A\mu -2p_B\mu_B\mu$$

and finally

$$\sigma^2=p_A \sigma_A^2+p_B \sigma_B^2+p_A (\mu_A - \mu)^2+p_B(\mu_B-\mu)^2$$ what is the LTV typical expression that we are used to see.

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
JGiner
  • 61
  • 1
  • 2