4

I am given a population $P$ that is equally divided into subsets $A$ and $B$. I know that a property $H$ of the population $P$ is normally distributed with mean $\mu_1$ and variance $\sigma_1^2$ for subset $A$ and $\mu_2$ and variance $\sigma_2^2$ for subset $B$. The task at hand is to find the exact median value of $H$ for the entire population $P$.

This is a Gaussian mixture with equal weights. I know that for a Gaussian distribution, the mean and the median are equal. I am trying to set up an equation based on the given data that is going to allow me to determine the median. I believe that the density of the Gaussian mixture is:

$$ f(h) = 0.5f_A(h) + 0.5f_B(h) $$

where $f_A$ is the Gaussian for subset $A$ and $f_B$ the Gaussian of subset $B$.Now the standard way to handle this problem for distributions of continuous random variables is to solve for $m$ the equation:

$$ \int_m^{\infty} f(h)dh = \frac{1}{2} $$

This does not look like a good strategy for this problem (for computational reasons -- although I may be totally off). If anyone could suggest a nicer, more elegant approach, I would deeply appreciate it!

Orest Xherija
  • 237
  • 1
  • 11
  • Is this a homework problem? If so, can you read and add the [self-study](http://stats.stackexchange.com/tags/self-study/info) tag? – Andrew M Oct 03 '16 at 23:35
  • @DilipSarwate I am stating in the beginning that the subpopulations are of the same size. – Orest Xherija Oct 03 '16 at 23:45
  • @AndrewM No, I encountered a version of the problem in machine learning context and was trying to see you approach it rigorously. Should I still add the tag, since it is, literally, self-study? – Orest Xherija Oct 03 '16 at 23:47
  • 1
    Well, you know that $F_A[x]+F_B[x]=1$, where $x$ is the mixture median and $F_A,F_B$ are the cumulative distribution functions (CDFs) of the mixture components. For a numerical solution, this is not too bad of a strategy, given that you know $F_k(x)=\Phi[(x-\mu_k)/\sigma_k]$ for $k=A,B$, where $\Phi$ is the CDF of the standard normal distribution. – GeoMatt22 Oct 04 '16 at 01:45
  • @GeoMatt22 Could you please elaborate a bit on how we get $F_A[x] + F_B[x] = 1$? The rest of your argument is perfectly clear! – Orest Xherija Oct 04 '16 at 02:12
  • This is the same as $F[x]=F_A[x]/2 + F_B[x]/2=1/2$, which is the integral of your expression for $f[x]$. – GeoMatt22 Oct 04 '16 at 02:22
  • This is so immediate and I still failed to see it! Thanks a lot for going into the trouble of explaining the obvious! (: – Orest Xherija Oct 04 '16 at 02:30

1 Answers1

5

Let $m$ denote the median of the mixture distribution whose CDF is $\frac 12F_A(x) + \frac 12 F_B(x)$. Then, \begin{align} \frac 12F_A(m) + \frac 12 F_B(m) &= \frac 12 \tag{1}\\ &\Downarrow\\ F_A(m) + F_B(m) &= 1\\ &\Downarrow\\ \Phi\left(\frac{m-\mu_A}{\sigma_A}\right) + \Phi\left(\frac{m-\mu_B}{\sigma_B}\right) &= 1\\ &\Downarrow\\ \frac{m-\mu_A}{\sigma_A} + \frac{m-\mu_B}{\sigma_B} &= 0 \tag{2} \end{align} where the last implication follows from the fact that $\Phi(x)+\Phi(-x)=1$. In short, the "computational reasons" that are deterring the OP are not really worrisome at all: solving $(2)$ for $m$ is trivial, and we get that $m$ is the linear combination $\dfrac{\mu_A\sigma_B + \mu_B\sigma_A}{\sigma_A+\sigma_B}$ of $\mu_A$ and $\mu_B$. Of course, if the mixture weights are $p$ and $1-p$ where $p \neq \frac 12$, that is, the mixture distribution is $$p\cdot F_A(x) + (1-p)\cdot F_B(x), \quad p \neq \frac 12,$$ then we need to solve $$p\cdot \Phi\left(\frac{m-\mu_A}{\sigma_A}\right) + (1-p)\cdot\Phi\left(\frac{m-\mu_B}{\sigma_B}\right) = \frac 12. \tag{3}$$ This will likely need numerical solution for $m$: at least, no straightforward analytical solution to (3) springs to my mind. I strongly suspect that it will turn out that, in general, $m$ is not a linear function of $\mu_A$ and $\mu_B$. My suspicion would be confirmed if I could produce just one specific instance of values of $p, \mu_A, \sigma_A, \mu_B,$ and $\sigma_B$ for which $m$ is not a linear function of $\mu_A$ and $\mu_B$ but don't have a specific instance to offer.

Dilip Sarwate
  • 41,202
  • 4
  • 94
  • 200