2

Suppose $$X \sim \mathcal N_n(\text{diag}(\Sigma), \sigma^2 \Sigma)$$ where $\Sigma$ may be allowed to be low rank, and let $Y = \min_i > X_i$.

What can be said about $P\left(Y \geq 0\right)$?

In general, I know that the exact distributions of Gaussian order statistics can be intractable, such as this math.se Q&A and the discussion here, but I'm hoping that the relationship between the mean and covariance matrix may lead to some simplification, or how I don't need the distribution of $Y$ but rather just the probability that it is greater than zero. The $X_i$ not being iid prevents me from using the usual things I know for examining minima and maxima, but I'm still hoping something can be done aside from numerical integration/simulation given values of $\Sigma$ and $\sigma$. I'd be very interested in approximations too.


The context on this and the unusual mean vector come from a now-deleted question on stats.se that essentially asked the following:

If we have $$X\sim\mathcal N_k(\mathbf 0, \sigma^2 I)$$ and nonrandom nonzero vectors $z_1,\dots,z_n\in\mathbb R^k$, what is the probability that $$\|X\|^2 \leq \|X-z_i\|^2 \text{ for all } i?$$

$$\|X-z_i\|^2 = \|X\|^2 - 2 X^Tz_i + \|z_i\|^2$$

so the question is equivalent to $P(\|z_i\|^2- 2 X^Tz_i \geq 0 \text{ for all }i)$. I collected the $z_i$ into the columns of a $k\times n$ matrix $Z$ so I can write the random variables in question as an affine transformation of $X$ via $$ \text{diag}(Z^TZ) - 2 Z^TX \sim \mathcal N_n(\text{diag}(Z^TZ), 4\sigma^2 Z^TZ) $$ and I want the probability that this random vector is all non-negative, so this led me to the question I asked. The factored form of $\Sigma$ here is why I want to allow for possibly low rank covariance matrices since I could have $k \leq n$.

MarianD
  • 1,493
  • 2
  • 8
  • 17
jld
  • 18,405
  • 2
  • 52
  • 65

1 Answers1

1

At least in the simplest non-trivial case, there are some tractable formulas. Let $$\Sigma=\begin{pmatrix}a & b\\b & d\end{pmatrix}, \text{ where }a>0,\ d>0,\ ad-b^2>0$$ The probability is given by a messy integral of the form $$P(Y\ge0)=\int_0^\infty\!\!\!\int_0^\infty f_{a,b,d,\sigma}(x_1,x_2)\,dx_1\, dx_2$$ However, the integral turns out nicely when $b=0$, giving an expression with the regularized incomplete gamma function $Q$: $$P(Y\ge0)|_{b=0}=\frac14 \left(Q\left(\frac12,\frac{a}{2\sigma^2}\right)-2\right)\! \left(Q\left(\frac12,\frac{d}{2\sigma^2}\right)-2\right) $$ The integral also turns out nicely when we differentiate by $b$ under the integral sign: $$ \frac{dP(Y\ge0)}{db}= \int_0^\infty\!\!\!\int_0^\infty \frac{df_{a,b,d,\sigma}(x_1,x_2)}{db}\,dx_1\, dx_2 = \frac{ \exp\left(\frac{-ad(a+d-2b)}{2\sigma^2(ad-b^2)}\right)}{2\pi\sqrt{ad-b^2}} $$ This lets us calculate the probability for any $a,B,d,\sigma$ as $$P(Y\ge0) = P(Y\ge0)|_{b=0}+\int_{b=0}^B \frac{dP(Y\ge0)}{db} db$$ Then:

  • We can check, e.g., that this correctly gives $P(Y\ge0)=0.4794$ when $a=1,\, b=1/2,\, d=2,\, \sigma=3$.
  • This formula is a computational improvement over the messy integral, replacing an infinite integral in two dimensions with one-dimensional integrals over finite regions.
  • The formula allows asymptotic analysis, e.g. for calculating a limiting probability as $b$ approaches $\sqrt{ad}$, where the messy approach of $\iint\lim_{b\to\sqrt{ad}}f_{a,b,d,\sigma}$ is ill-defined.

Some variant of this approach may also work for $n>2$, especially in the low-rank situation mentioned in the post. Feynman was famous for solving problems like this.

Matt F.
  • 1,656
  • 4
  • 20