3

Imagine you have a two-dimensional multivariate normal random variable with $\mu = [0, 0]$ and $\Sigma\ = \begin{bmatrix}1 & r\\r & 1\end{bmatrix}$. (Conceptually, you have two random normal variables with a correlation of $r$.) You take $N$ samples from this variable, such that you have a $N \times 2$ matrix - the first column contains the samples from the first dimension, the second column contains the samples from the second dimension. (These columns, of course, have a correlation of $r$.)

Here comes the crucial part. You take the $K$ samples which are highest from the first column. Then, you choose the number from among those $K$ which is highest in the second column.

What is the distribution of this final number? It's easy to simulate, but is it analytically solvable (or does anyone know where I could start looking)?

  • It is too complicated to be worth writing analytically. See https://stats.stackexchange.com/questions/416675 for the case $K=1.$ – whuber Jul 27 '19 at 18:38

1 Answers1

0

I don't have a full solution, but I'd start with the following. Your two sequences of random numbers $x_{1i},x_{2i}$ can be generated by applying cholesky decomposition to the correlation matrix $\Sigma$, then multiplying the cholesky matrix by two independent randoms $\xi_{1i},\xi_{2i}$: $$x_{1i}=\xi_{1i}\\ x_{2i}=\xi_{1i}+\sqrt{1-\rho^2}\xi_{2i}$$

Now, when you take the largest K numbers $i_j,j=1,2,\dots,K$ of the first sequence $x_{1i_j}=\xi_{1i_j}$, you also pick the second sequence's subset $x_{2i_j}$. What is the distribution of $x_{2i_j}$?

Note, that the distribution of $\xi_{2i_j}$ is independent from $x_1$, therefore it must be still normal. So, if you know the distribution of $\xi_{1i_j}$ then the distribution in question is of the sum of independent random numbers one of which is definitely normal.

The set of K largest $\xi_{1i_j}$ is a set of K order statistics of normal distribution sample of numbers. I need to think of what would be the distribution, it's definitely NOT normal.

Aksakal
  • 55,939
  • 5
  • 90
  • 176
  • Thanks for the answer. Can I get a little clarification on your notation? What is $\rho$ representing? And when you write $x_{1i_{j}}$, do you mean the subset consisting of the K largest numbers of $x_{1i}$? Also, why would the distribution of $\xi_{2i_{j}}$ be normal if it's a non-randomly-chosen subset of a normal? I feel like I'm missing something there. – Adam Morris Jul 27 '19 at 14:39
  • @AdamMorris, clarified my answer. Yes, if you take the random sample from normal of course it's random, that's how all random sampling works. Since the second set $\xi_2$ is independent of $\xi_1$, regardless of how you pick $\xi_1$ the corresponding $\xi_2$ numbers will be random. – Aksakal Jul 27 '19 at 18:05