6

For three independent normally distributed continuous random variables X, Y, and Z (each with its own mean and standard deviation), I need a way to calculate

$P(Y \geq X, Y \leq Z)$

I know that I can do this by the following:

$P(Y \geq X, Y \leq Z) = P(Y \geq X) \cdot P(Y \leq Z | Y \geq X)$

I am able to calculate $P(Y \geq X)$ using the following relation: $P(Y \geq X) = P(Y - X \geq 0)$

However, I'm having trouble calculating precisely $P(Y \leq Z | Y \geq X)$

Macro
  • 40,561
  • 8
  • 143
  • 148
Abey
  • 63
  • 4
  • 3
    Hint: Conditioned on $Y = y$, the event $\{Y \geq X, Y \leq Z\}$ has conditional probability $$P\{Y \geq X, Y \leq Z \mid Y = y\} = P\{X \leq y, Z \geq y \mid Y = y\} = \Phi((y-\mu_X)/\sigma_x)(1 - \Phi((y-\mu_z)/\sigma_Z)).$$ Multiply by the density of $Y$ and integrate. – Dilip Sarwate May 24 '12 at 02:39
  • Is this a homework problem? Darn it. I have an answer I'm proud of, but I'm not sure if I ought to post it. – Cyan May 24 '12 at 03:33
  • Just to check - are you calculating $P(x \leq y \leq z)$ or do you already know that $x \leq z$ and are calculating the probability that $y$ is inbetween? – jbowman May 24 '12 at 03:56
  • Cyan, no it's not homework, it's just a problem that was driving me mad. – Abey May 24 '12 at 04:04
  • Hi jbowman. x is not necessarily <= z, so it would be P(x <= y <= z). Thanks. – Abey May 24 '12 at 04:05
  • Macro's hint is definitely true. I have checked this by dividing the Y distribution into 1/(2^11) parts from -6 standard deviations to +6 standard deviations. At each point of demarcation I calculated p = P{X <=y, Z >=y | Y = y} * (density at y). To calculate the area for each chunk, I did (y2 - y1) * ((p2 + p1) / 2). The answer is just the sum of the chunk areas. This seems to be a slow way though, so I will study Cyan's answer this coming week to see if I can get a way to do it without iteration. – Abey May 24 '12 at 15:05
  • @Abey, If you're using R or MATLAB w/ Statistics Toolbox, I can give you the code right now... – Cyan May 25 '12 at 04:29
  • Cyan, if you could post the code in R, that would be great. Thanks. :) – Abey May 25 '12 at 23:26
  • @Abey, sorry for the delay -- I clean forgot about making the offer! Here it is. – Cyan May 31 '12 at 01:42

2 Answers2

8

One relatively easy approach is to consider $X$, $Y$, and $Z$ as having a joint multivariate normal distribution.

$\left[\begin{array}{c}X\\Y\\Z\end{array}\right]\sim\mathrm{MVN}\left(\left[\begin{array}{c}\mu_{X}\\\mu_{Y}\\\mu_{Z}\end{array}\right],\left[\begin{array}{ccc}\sigma_{X}^{2} & 0 & 0\\0 & \sigma_{Y}^{2} & 0\\0 & 0 & \sigma_{Z}^{2}\end{array}\right]\right)$

Let

$\left[\begin{array}{c} U\\ V\end{array}\right]=\left[\begin{array}{c} X-Y\\ Z-Y\end{array}\right]=\left[\begin{array}{ccc} 1 & -1 & 0\\ 0 & -1 & 1\end{array}\right]\left[\begin{array}{c} X\\ Y\\ Z\end{array}\right]$

Then by standard results on affine transformations of multivariate normal distributions,

$\left[\begin{array}{c} U\\ V\end{array}\right]\sim\mathrm{MVN}\left(\left[\begin{array}{c} \mu_{X}-\mu_{Y}\\ \mu_{Z}-\mu_{Y}\end{array}\right],\left[\begin{array}{cc} \sigma_{X}^{2}+\sigma_{Y}^{2} & \sigma_{Y}^{2}\\ \sigma_{Y}^{2} & \sigma_{Z}^{2}+\sigma_{Y}^{2}\end{array}\right]\right)$

And since $P(Y \geq X, Y \leq Z) = P(U \leq 0, V \geq 0)$, you want the probability mass of this bivariate distribution in the second quadrant. This is not analytically solvable in general, but is easy to compute. If $\mu_X = \mu_Y = \mu_Z$, then there is an analytical expression (from equation 73 here):

$P(U \leq 0, V \geq 0) = \frac{1}{2} \cos^{-1}\left(\frac{\sigma^2_{Y}}{\sqrt{(\sigma^2_{X} + \sigma^2_{Y}) (\sigma^2_{Z} + \sigma^2_{Y})}}\right)$.

Added: Here's R code to compute the probability.

install.packages("mvtnorm")
library(mvtnorm)
mu_x <- -1.4
mu_y <- 2
mu_z <- 1.7
mu_vec <- c(mu_x- mu_y, mu_z - mu_y) 
var_x <- 9
var_y <- 9
var_z <- 16
Sigma <- var_y + matrix(c(var_x, 0, 0 , var_z), nrow = 2)
pmvnorm(lower = c(-Inf, 0), upper = c(0, Inf), mean = mu_vec, sigma = Sigma)
Cyan
  • 2,748
  • 17
  • 21
  • Thank you Cyan. Although I'm not experienced on the level you post about, I will study upon it. Much appreciated. :) – Abey May 24 '12 at 04:20
  • What pieces are you missing? Maybe I can point you to some resources. – Cyan May 24 '12 at 04:26
  • Honestly, I am not very familiar with the standardized ways of expressing these concepts. Thus, the meaning of those first three parts is unclear to me. I understand the last line; however, the means of the three distributions are not necessarily the same. – Abey May 24 '12 at 04:34
  • [The bivariate normal distribution](http://www.aiaccess.net/English/Glossaries/GlosMod/e_gm_binormal_distri.htm#ani_binormal) might be a good starting point -- the linked page has nice animations. Then you'll need to know about [vectors](http://en.wikipedia.org/wiki/Euclidean_vector) (especially [in Cartesian space](http://en.wikipedia.org/wiki/Euclidean_vector#In_Cartesian_space)) and [matrices](http://en.wikipedia.org/wiki/Matrix_(mathematics)). (cont'd) – Cyan May 24 '12 at 05:11
  • After that you'll be ready to tackle [random vectors](http://www.aiaccess.net/English/Glossaries/GlosMod/e_gm_random_vector.htm) and the [multivariate normal distribution](http://www.aiaccess.net/English/Glossaries/GlosMod/e_gm_multinormal_distri.htm). – Cyan May 24 '12 at 05:11
  • Cyan, are there some typos in your final displayed equation? The denominator of the argument of the arccos is of the form $\sqrt{a\cdot a}$. Also, $\sigma_Z^2$ does not appear anywhere in the formula which is a bit surprising. – Dilip Sarwate May 24 '12 at 12:51
  • Cyan, I have run the R code, and it works great. Thank you! :) – Abey May 31 '12 at 14:46
1

I might just make many draws from the distribution and calculate the rate that the event you are interested in occurs. In R:

N=10^7
 x=rnorm(N,mu_x,sig_x)
 y=rnorm(N,mu_y,sig_y)
 z=rnorm(N,mu_z,sig_z)
 sum(x<y & y >z )/N

It is just an estimation so maybe do it a couple times. Quick and dirty

Seth
  • 577
  • 4
  • 11
  • 1
    The final line of code should be "sum(x – Cyan May 31 '12 at 13:04
  • This is a great way to check an answer. But did you notice the word "precisely" in the question? – whuber May 31 '12 at 14:13
  • Thanks Seth, and ditto on good idea to check the answer. – Abey May 31 '12 at 15:11
  • I noticed the word 'precisely', I just ignored it. Just kidding. Both cyan's and this method can be computed with any level of precision. – Seth May 31 '12 at 15:15
  • Up to a point Seth: the precision is inherently limited by your computing capabilities and lifetime of the universe. Remember, the SD of a simulated estimate scales only as $N^{-1/2}$. That will make it prohibitively difficult to attain more than about six significant figures; your example only gets about four sig figs. (Every once in a while even six sig figs is not good enough as a check, so this is not entirely nit-picking.) – whuber May 31 '12 at 18:07