6

Suppose $X_1,X_2,\ldots,X_n$ are i.i.d random variables with an absolutely continuous distribution.

We say the observation $X_i$ has rank $R_i$ if $$X_i=X_{(R_i)}\quad,\,i=1,2,\ldots,n,$$

where $X_{(k)}$ is the $k$-th order statistic.

I am looking for the correlation between $X_i$ and $R_i$ for each $i=1,\ldots,n$.

Let us assume that $X_i\sim F$, where the distribution function $F$ is known. The difficulty I am facing is that I do not know the joint distribution of $(X_i,R_i)$, which is required for finding $E(X_iR_i)$ in the expression for the covariance. But I suspect that the correlation can be derived regardless.

We can find the mean and variance of $X_i$ and $R_i$ separately once we have the distribution $F$ at hand. But how can we find the covariance?

I know that the conditional distribution $[(X_1,\ldots,X_n)\mid X_{(1)},\ldots,X_{(n)}]$ has the form

$$P\left[X_1=x_1,\ldots,X_n=x_n\mid X_{(1)}=x_{(1)},\ldots,X_{(n)}=x_{(n)}\right]=\frac{1}{n!}\mathbf1_{(x_1,\ldots,x_n)\in A},$$

where $A$ is the set of $n!$ realisations of $(x_{(1)},\ldots,x_{(n)})$.

And that the rank vector $(R_1,R_2,\ldots,R_n)$ is also distributed as

$$P(R_1=r_1,\ldots,R_n=r_n)=\frac{1}{n!}\mathbf 1_{(r_1,\ldots,r_n)\in B},$$

where $B$ is the set of $n!$ realisations of $(1,2,\ldots,n)$.

Any hints on how to proceed would be great.

ttnphns
  • 51,648
  • 40
  • 253
  • 462
StubbornAtom
  • 8,662
  • 1
  • 21
  • 67

2 Answers2

3

I will give a hint. The key concept is exchangeability, meaning that the random vector $(X_1, \dotsc, X_n)$ has the same distribution as $(X_{\pi 1}, \dotsc, X_{\pi n})$ for all permutations $\pi$ of $(1,2,\dotsc, n)$. Then you can check that the vector of ranks $(R_1, \dotsc, R_n)$ also will be exchangeable. Exchangeability is a generalization of iid, so will generalize your eventual result.

We need something more: even the distribution of the $n$ pairs $$ \left( (\begin{smallmatrix} X_1\\R_1\end{smallmatrix}), \dotsc, (\begin{smallmatrix} X_n\\R_n\end{smallmatrix}) \right) $$ is exchangeable. (Then of course we need to assume first exist).

Now calculate: (for some $j$ between 1 and $n$) \begin{align} \DeclareMathOperator{\E}{\mathbb{E}} & \sum_\pi \E X_{\pi j} R_{\pi j} \\ = {} & \E \sum_\pi X_{\pi j} R_{\pi j} \\ = {} & \E \sum_{r=1}^n \sum_{\pi\colon R_{\pi j=r}} X_{\pi j} R_{\pi j} \\ = {} & (n-1)! \sum_{r=1}^n \E X_{\pi j} r \\ = {} & (n-1)! \mu \frac{n (n+1)}{2} \end{align} where $\mu$ is the common expectation of the $X_i$. You should be able to conclude.

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
  • Thanks. Could you tell me why the $(n-1)!$ comes in the 3rd step? – StubbornAtom Nov 08 '18 at 17:06
  • Because the sum over n! permutations is divided i to n classes, each one fixing the rank of variable j, each one having (n-1)! elements – kjetil b halvorsen Nov 08 '18 at 17:29
  • Does this mean that for each $j=1,\ldots,n$, $$E(X_jR_j)=\frac{\mu(n-1)!(n+1)}{2 n!}$$? I think I am misunderstanding the argument. – StubbornAtom Nov 09 '18 at 13:26
  • 1
    No. By exchangeability, all the expectations summed over are equal, so the sum is that common expectation times $n!$. So the expectation is $\mu \times \frac{n+1}{2}$, which you can see equals $\E X_j \times \E R_j$. It will follow that the correlation is zero. – kjetil b halvorsen Nov 09 '18 at 13:30
  • 1
    I have been thinking about this. There is [this](https://math.stackexchange.com/questions/1689769/correlation-between-a-random-variable-and-its-rank/) similar question on Math.SE where the answer suggests that the correlation can be non-zero. I will get back to you when I make some progress. – StubbornAtom Nov 16 '18 at 16:04
  • 1
    The answer given there cannot be correct, see my comment there. – kjetil b halvorsen Nov 16 '18 at 16:31
  • I have attempted at a solution myself after coming across this discussion in a textbook. Please have a look when you can. It appears that the correlation is non-zero in general. I should also add that @joriki's answer on Math.SE matches with the general expression I found. – StubbornAtom Jan 23 '21 at 11:34
  • 1
    @StubbornAtom: You are right, my answer is wrong. I will look into it to see where it went wrong ... – kjetil b halvorsen Jan 23 '21 at 22:43
1

We can find $\operatorname E\left[R_1X_1\right]$ using the conditional distribution of $X_1$ given $R_1$. The distribution of $X_1$ conditioned on $R_1=j$ is simply the distribution of $X_{(j)}$, since $R_1=j \implies X_1=X_{(j)}$ by definition for every $j=1,\ldots,n$.

Hence,

\begin{align} \operatorname E\left[R_1X_1\right]&=\sum_{j=1}^n \operatorname E\left[R_1X_1\mid R_1=j\right]\Pr(R_1=j) \\&=\frac1n\sum_{j=1}^n j\operatorname E\left[X_1\mid R_1=j\right] \\&=\frac1n\sum_{j=1}^n j\operatorname E\left[X_{(j)}\right] \end{align}

Now $R_1$ has a uniform distribution on $\{1,2,\ldots,n\}$ with mean $\frac{n+1}2$ and variance $\frac{n^2-1}{12}$.

So if $\sigma^2$ is the variance of $X_1$, then

$$\operatorname{Corr}(X_1,R_1)=\left(\frac{12}{n^2-1}\right)^{1/2}\frac{\sum_{j=1}^n j\operatorname E\left[X_{(j)}\right]- (n(n+1)/2)\operatorname E[X_1]}{n\sigma}$$

Nonparametric Statistical Inference (5th ed.) by Gibbons and Chakraborti discusses this result on pages 191-192:

enter image description here

The authors subsequently give an alternative expression for the correlation by deriving

$$\sum_{j=1}^n j\operatorname E\left[X_{(j)}\right]=n(n-1)\operatorname E\left[X_1F(X_1)\right]+n\operatorname E[X_1]\,,$$

where $F$ is the common cdf of the $X_j$'s.

And finally,

$$\boxed{\operatorname{Corr}(X_1,R_1)=\left(\frac{12(n-1)}{n+1}\right)^{1/2}\frac1{\sigma}\left[\operatorname E\left[X_1F(X_1)\right]-\frac12 \operatorname E[X_1]\right]}$$

StubbornAtom
  • 8,662
  • 1
  • 21
  • 67