Why is a mixture of two normally distributed variables only bimodal if their means differ by at least two times the common standard deviation?

Question

Under mixture of two normal distributions:

https://en.wikipedia.org/wiki/Multimodal_distribution#Mixture_of_two_normal_distributions

"A mixture of two normal distributions has five parameters to estimate: the two means, the two variances and the mixing parameter. A mixture of two normal distributions with equal standard deviations is bimodal only if their means differ by at least twice the common standard deviation."

I am looking for a derivation or intuitive explanation as to why this is true. I believe it may be able to be explained in the form of a two sample t test:

$$\frac{\mu_1-\mu_2}{\sigma_p}$$

where $\sigma_p$ is the pooled standard deviation.

the intuition is that, if the means are too close, then there will be too much overlap in the mass of the 2 densities so the difference in means won't be seen because the difference will just get glopped in with the mass of the two densities. If the two means are different enough, then the masses of the two densities won't overlap that much and the difference in the means will be discernible. But I'd like to see a mathematical proof of this. It's an nteresting statement. I never saw it before. — mlofton, Jul 05 '19 at 21:09
More formally, for a 50:50 mixture of two normal distributions with the same SD $\sigma,$ if you write the density $f(x) = 0.5g_1(x) + 0.5g_2(x)$ in full form showing the parameters, you will see that its second derivative changes sign at the midpoint between the two means when the distance between means increases from below $2\sigma$ to above. — BruceET, Jul 05 '19 at 21:45
See "Rayleigh Criterion," https://en.wikipedia.org/wiki/Angular_resolution#Explanation — Carl Witthoft, Jul 08 '19 at 13:12

score 56 · Accepted Answer · edited Jul 08 '19 at 16:23

56

This figure from the the paper linked in that wiki article provides a nice illustration:

The proof they provide is based on the fact that normal distributions are concave within one SD of their mean (the SD being the inflection point of the normal pdf, where it goes from concave to convex). Thus, if you add two normal pdfs together (in equal proportions), then as long as their means differ by less than two SDs, the sum-pdf (i.e. the mixture) will be concave in the region between the two means, and therefore the global maximum must be at the point exactly between the two means.

Reference: Schilling, M. F., Watkins, A. E., & Watkins, W. (2002). Is Human Height Bimodal? The American Statistician, 56(3), 223–229. doi:10.1198/00031300265

edited Jul 08 '19 at 16:23

Axeman

199
1
11

answered Jul 05 '19 at 21:51

Ruben van Bergen

6,511
1
20
38

11

+1 This is a nice, memorable argument. – whuber Jul 05 '19 at 22:11
2

The figure caption also provides a nice illustration of the 'fl' ligature being misrendered in 'inflection' :-P – nekomatic Jul 08 '19 at 14:54
2

@Axeman: Thanks for adding that reference - since this blew up a bit I had been planning to add it myself, since I'm really just repeating their argument and I don't want to take too much credit for that. – Ruben van Bergen Jul 08 '19 at 16:38

score 15 · Answer 2 · edited Jul 08 '19 at 12:50

This is a case where pictures can be deceiving, because this result is a special characteristic of normal mixtures: an analog does not necessarily hold for other mixtures, even when the components are symmetric unimodal distributions! For instance, an equal mixture of two Student t distributions separated by a little less than twice their common standard deviation will be bimodal. For real insight then, we have to do some math or appeal to special properties of Normal distributions.

Choose units of measurement (by recentering and rescaling as needed) to place the means of the component distributions at $\pm\mu,$ $\mu\ge 0,$ and to make their common variance unity. Let $p,$ $0 \lt p \lt 1,$ be the amount of the larger-mean component in the mixture. This enables us to express the mixture density in full generality as

$$\sqrt{2\pi}f(x;\mu,p) = p \exp\left(-\frac{(x-\mu)^2}{2}\right) + (1-p) \exp\left(-\frac{(x+\mu)^2}{2}\right).$$

Because both component densities increase where $x\lt -\mu$ and decrease where $x\gt \mu,$ the only possible modes occur where $-\mu\le x \le \mu.$ Find them by differentiating $f$ with respect to $x$ and setting it to zero. Clearing out any positive coefficients we obtain

$$0 = -e^{2x\mu} p(x-\mu) + (1-p)(x+\mu).$$

Performing similar operations with the second derivative of $f$ and replacing $e^{2x\mu}$ by the value determined by the preceding equation tells us the sign of the second derivative at any critical point is the sign of

$$f^{\prime\prime}(x;\mu,p) \propto \frac{(1+x^2-\mu^2)}{x-\mu}.$$

Since the denominator is negative when $-\mu\lt x \lt \mu,$ the sign of $f^{\prime\prime}$ is that of $-(1-\mu^2 + x^2).$ It is clear that when $\mu\le 1,$ the sign must be negative. In a multimodal distribution, however (because the density is continuous), there must be an antimode between any two modes, where the sign is non-negative. Thus, when $\mu$ is less than $1$ (the SD), the distribution must be unimodal.

Since the separation of the means is $2\mu,$ the conclusion of this analysis is

A mixture of Normal distributions is unimodal whenever the means are separated by no more than twice the common standard deviation.

That's logically equivalent to the statement in the question.

BruceET · Answer 3 · 2019-07-06T17:51:43.273

13

Comment from above pasted here for continuity:

"[F]ormally, for a 50:50 mixture of two normal distributions with the same SD σ, if you write the density $$f(x)=0.5g_1(x)+0.5g_2(x)$$ in full form showing the parameters, you will see that its second derivative changes sign at the midpoint between the two means when the distance between means increases from below 2σ to above."

Comment continued:

In each case the two normal curves that are 'mixed' have $\sigma=1.$ From left to right the distances between means are $3\sigma, 2\sigma,$ and $\sigma,$ respectively. The concavity of the mixture density at the midpoint (1.5) between means changes from negative, to zero, to positive.

R code for the figure:

par(mfrow=c(1,3))
  curve(dnorm(x, 0, 1)+dnorm(x,3,1), -3, 7, col="green3", 
    lwd=2,n=1001, ylab="PDF", main="3 SD: Dip")
  curve(dnorm(x, .5, 1)+dnorm(x,2.5,1), -4, 7, col="orange", 
    lwd=2, n=1001,ylab="PDF", main="2 SD: Flat")
  curve(dnorm(x, 1, 1)+dnorm(x,2,1), -4, 7, col="violet", 
    lwd=2, n=1001, ylab="PDF", main="1 SD: Peak")
par(mfrow=c(1,3))

edited Jul 06 '19 at 17:51

answered Jul 05 '19 at 22:17

BruceET

47,896
2
28
76

1

all of the answers were great. thanks. – mlofton Jul 06 '19 at 02:49
3

It may be worth noting that although the middle figure ("2 SD: Flat") *looks* flat near the center, it is in fact unimodal with a global maximum at the center. The "flat" part corresponds to a central region of width slightly more than $2/3$, where the density departs from the maximum by less than $0.001.$ – r.e.s. Jul 09 '19 at 01:26
1

My previous comment should have said "where the density departs from the maximum by less than $0.1\%$ *of the maximum*." More precisely, in this case $f$ has a global maximum at the center (say $x_0)$, and $$f(x_0)-f(x)\le 0.001 f(x_0)\ \iff\ |x-x_0|\le 0.333433,$$ whereas the width of the region where the departure is less than $0.001$ is larger, approximately $0.95832$: $$f(x_0)-f(x)\le 0.001\ \iff\ |x-x_0|\le 0.47916.$$ – r.e.s. Jul 09 '19 at 13:35
Good points. Actually, what I meant by abbreviated language 'flat' was zero 2nd derivative exactly at the midpoint. – BruceET Jul 09 '19 at 18:06

Why is a mixture of two normally distributed variables only bimodal if their means differ by at least two times the common standard deviation?

3 Answers3

Linked