3

For $X_i \sim$ iid random variables:

For $1\le r_1 < ..<r_k \le n$ integers, I am trying to find the joint pdf of: $$ (X_{(r_1)},...,X_{(r_n)}) $$ where $X_{(r_1)}$ is the $r_1$th largest observation. I am wondering if anyone has seen the solution to this problem somewhere online? My current attempt:

Choosing $\epsilon$ small enough such that only one observation falls in an interval of width $\epsilon$

\begin{align*} &P(X_{(r_1)} \in (x_1 - \epsilon,x_1+\epsilon),.......,X_{(r_k)}\in (x_k - \epsilon,x_k+\epsilon))\\ &=P( n-k ~\text{of}~ X_1,....,X_n \in (-\infty, x_1-\epsilon),\\ &1 ~\text{of}~ X_1,....,X_n \in (x_1-\epsilon,x_1+\epsilon),\\ &....\\ &1 ~\text{of}~ X_1,....,X_n \in (x_k-\epsilon,x_k+\epsilon),)\\ &+\\ &P( n-k-1 ~\text{of}~ X_1,....,X_n \in (-\infty, x_1-\epsilon),\\ &1 ~\text{of}~ X_1,....,X_n \in (x_1-\epsilon,x_1+\epsilon),\\ &....\\ &1 ~\text{of}~ X_1,....,X_n \in (x_k-\epsilon,x_k+\epsilon),\\ &1 ~\text{of}~ X_1,....,X_n \in (x_k+\epsilon + \infty),\\ &+.... \end{align*} and so on and so forth accounting for all distributions of the remaining n-k observations between the area before $r_1$ and the area after $r_k$.

Each one of these is a multinomial, and has corresponding expressions in terms of the CDFs (dividing and taking epsilons to zero). After all the working, I get to the point:

$$ f_{(X_(r_1),....,X_(r_k)} (x_1,...,x_k)= $$ $$ n! \prod_{i=1}^{k} f(x_i) \left[ \frac{F(x_1)^{n-k}}{(n-k)!} +\frac{F(x_1)^{n-k-1} (1-F(x_k))}{(n-k-1)!} +......+\frac{ (1-F(x_k))^{n-k}}{(n-k)!} \right] $$

Not sure if I am on the right track, if anyone has seen this distribution before could you let me know if my attempt is so far correct?

WeakLearner
  • 1,013
  • 1
  • 12
  • 23
  • 2
    i think you can refer to this question for (a) http://stats.stackexchange.com/questions/161145/distribution-of-sum-of-order-statistics – Deep North Aug 01 '15 at 06:25
  • $=n!f(y_1)f(y_2)...f(y_r)\frac{[1-F(y_r)]^{n-r}}{(n-r)!}$, $y_1$ equal $X_{(r1)}$ for your case – Deep North Aug 01 '15 at 06:34
  • 2
    @DeepNorth the very last statement of this paper : https://www.google.com.au/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&ved=0CB0QFjAAahUKEwi6556S3ofHAhUE2KYKHXxZCDU&url=http%3A%2F%2Fwww.springer.com%2Fcda%2Fcontent%2Fdocument%2Fcda_downloaddocument%2F9789491216824-c2.pdf%3FSGWID%3D0-0-45-1386236-p174707463&ei=dqe8VbrUHYSwmwX8sqGoAw&usg=AFQjCNGvsme54qftXnmSC77YVAGYCYl8gw&sig2=bRzcu6s13p3LITRcAbFqpg&bvm=bv.99261572,d.dGY&cad=rja , has an expression for the joint pdf, I'm not sure if this agrees with the result you figured out in the question you linked ? – WeakLearner Aug 01 '15 at 11:10
  • @DeepNorth also, your answer in (a) assumes $r_1 = 1$, whereas what im trying to figure out doesn't necessarily have that? – WeakLearner Aug 01 '15 at 11:13
  • I think the paper talked about join pdf of two order statistics at most. – Deep North Aug 01 '15 at 11:30
  • 1
    in this question two points are missing, first you did not tell us anything about $X_i,i=1,\ldots$ and their inter correlations and also type of variables. Under general situations that variables are iid, the answer is $f_{X_{(1)},\ldots,X_{(n)}}=n!\prod _1^n f_X(x_i)$ see here https://en.wikipedia.org/wiki/Order_statistic – TPArrow Aug 01 '15 at 11:51
  • @DeepNorth see the statement of the question, exercise 2.7 on page 19, and the solution at the very end of page 21 – WeakLearner Aug 01 '15 at 12:35
  • @Hamed I updated to mention that they are iid, again i am familiar with the $X_{(1)},....,X_{(n)}$, in this case we are not dealing with that situation.. – WeakLearner Aug 01 '15 at 12:37
  • @dimebucker91: Since [the reference you provide](https://www.google.com.au/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&ved=0CB0QFjAAahUKEwi6556S3ofHAhUE2KYKHXxZCDU&url=http%3A%2F%2Fwww.springer.com%2Fcda%2Fcontent%2Fdocument%2Fcda_downloaddocument%2F9789491216824-c2.pdf%3FSGWID%3D0-0-45-1386236-p174707463&ei=dqe8VbrUHYSwmwX8sqGoAw&usg=AFQjCNGvsme54qftXnmSC77YVAGYCYl8gw&sig2=bRzcu6s13p3LITRcAbFqpg&bvm=bv.99261572,d.dGY&cad=rja) has an expression of the joint pdf, what is exactly your question? Do you try to prove this result or do you disagree with it? – Xi'an Aug 01 '15 at 17:14
  • @Xi'an I am trying to prove this result, I don't think my attempt is compatible with what they've written down and i'm confused as to their derivation – WeakLearner Aug 02 '15 at 02:53
  • @Xi'an I'm also not so sure i agree with it, since it has the $F(x_m) - F(x_{m-1)}$ term in it, this doesn't make sense to me since we are trying to find the joint distribution of k order stats, that means that we have a block of k observations in the middle, and a distribution of the remaining $n-k$ observations in all the possible combinations BEFORE and AFTER those block of $k$, but not in between, if that makes intuitive sense – WeakLearner Aug 02 '15 at 03:51
  • It is a matter of notation: $x_m$ is the value taken by $X_{k(m)}$, not $X_m$. Look at the power:$$[F(x_m)-F(x_{m-1})]^{k(m)-k(m-1)-1}$$clearly considers a block of size $(k(m)-k(m-1)-1)$ between $X_{k(m-1)}$ and $X_{k(m)}$. – Xi'an Aug 02 '15 at 08:12

1 Answers1

2

Since$$f_{X_{(1)},\ldots,X_{(n)}}(x_1,\ldots,x_n)=n!\prod_{i=1}^n f_X(x_i)\mathbb{I}_{x_1\le x_2\le\ldots\le x_n}$$the marginal of $(X_{(r_1)},\ldots,X_{(r_k)})$ is obtained by integration (with some abuses of notation, see e.g. the integral bounds): \begin{align} f_{X_{(r_1)},\ldots,X_{(r_k)}}(x_{r_1},\ldots,x_{r_k}) &=\int f_{X_{(1)},\ldots,X_{(n)}}(x_1,\ldots,x_n)\mathbb{I}_{x_{r_1}\le x_{r_2}\le\ldots\le x_{r_k}}\,\prod_{i\notin\{r_1,\ldots,r_k\}}\text{d}x_i\\ &=\int n!\prod_{i=1}^n f_X(x_i)\mathbb{I}_{x_1\le x_2\le\ldots\le x_n}\prod_{i\notin\{r_1,\ldots,r_k\}}\text{d}x_i\\ &=n!\int \prod_{i=1}^{r_1-1} f_X(x_i)f_X(x_{r_1})\prod_{i=r_1+1}^{r_2-1} f_X(x_i)\cdots \\ &\qquad\cdots f_X(x_{r_k})\prod_{i=r_k+1}^{n}f_X(x_i)\mathbb{I}_{x_1\le x_2\le\ldots\le x_n}\prod_{i\notin\{r_1,\ldots,r_k\}}\text{d}x_i\\ &=n!\prod_{i=1}^{r_1-1}\int_{x_{i-1}}^{x_{i+1}} f_X(x_i)\text{d}x_i\,f_X(x_{r_1})\prod_{i=r_1+1}^{r_2-1}\int_{x_{i-1}}^{x_{i+1}}\, f_X(x_i)\text{d}x_i\,f_X(x_{r_2})\cdots\\ &\quad\cdots f_X(x_{r_k})\,\prod_{i=r_k+1}^{n}\int_{x_{i-1}}^{x_{i+1}} f_X(x_i)\text{d}x_i\mathbb{I}_{x_{r_1}\le x_{r_2}\le\ldots\le x_{r_k}}\\ &=n!\frac{F_X(x_{r_1})^{r_1-1}}{(r_1-1)!}\frac{[F_X(x_{r_2})-F_X(x_{r_1})]^{r_2-r_1-1}}{(r_2-r_1-1)!}\cdots\\ &\qquad\cdots\frac{[1-F_X(x_{r_k})]^{n-r_k-1}}{(n-r_k-1)!}\prod_{i=1}^k f_X(x_{r_i})\mathbb{I}_{x_{r_1}\le x_{r_2}\le\ldots\le x_{r_k}}\\ \end{align} which is the result produced in the reference (except for the use of $(x_1,...,x_k)$ as the argument of the density). The last integral follows from repeated integrations of $f_X(x)[F_X(x)-F_X(x_j)]^\alpha$; for instance, the first group of integrals leads to \begin{align*} \int_{x_{1}\le\ldots\le x_{r_1}}\prod_{i=1}^{r_1-1} f_X(x_i)\text{d}x_i &=\int_{x_2\le\ldots\le x_{r_1}}\prod_{i=2}^{r_1-1} f_X(x_i)\left\{\int_{-\infty}^{x_2} f_X(x_1)\right\}\text{d}x_1\prod_{i=2}^{r_1-1}\text{d}x_i\\ &=\int_{x_2\le\ldots\le x_{r_1}}\prod_{i=2}^{r_1-1}f_X(x_i)F_X(x_2)\text{d}x_i\\ &=\int_{x_3\le\ldots\le x_{r_1}}\prod_{i=2}^{r_1-1}f_X(x_i)\left\{\int_{-\infty}^{x_3} f_X(x_2)F_X(x_2)\text{d}x_2\right\}\prod_{i=3}^{r_1-1}\text{d}x_i\\ &=\int_{x_3\le\ldots\le x_{r_1}}\prod_{i=3}^{r_1-1}f_X(x_i)\frac{F_X(x_3)^2}{2!}\text{d}x_i\\ &=\ldots \end{align*}

Xi'an
  • 90,397
  • 9
  • 157
  • 575
  • one last issue I am having, when evaluating one of the expressions like: $\prod_{i=1}^{r_1-1}\int_{x_{i-1}}^{x_{i+1}} f_X(x_i)\text{d}x_i\,f_X(x_{r_1})$, would this not evaluate out to be: \begin{align} \prod_{i=1}^{r_1-1} \int_{x_{i-1}}^{x_{i+1}}f(x_i)dx_i &=\prod_{i=2}^{r_1-1} \int_{x_{i-1}}^{x_{i+1}}f(x_i)dx_i \left (\int_{-\infty}^{x_2}f(x_2)dx_2 \right)\\ &=\prod_{i=2}^{r_1-1} \int_{x_{i-1}}^{x_{i+1}}f(x_i)F(x_2)dx_i\\ &=\prod_{i=3}^{r_1-1} \int_{x_{i-1}}^{x_{i+1}}f(x_i) dx_i\left(\int_{x_1}^{x_3}f(x_2)F(x_2)dx_2 \right)\\ &=\cdots \end{align} – WeakLearner Aug 03 '15 at 03:02
  • As indicated in my reply, the notation $\int_{x_{i-1}}^{x_{i+1}}f$ is abusive. The correct presentation is the one of my second set of equation. In your first equation, the dummy variable of the inner integral should be $x_1$ not $x_2$. And the lower bound of the inner integral of the last equation should be $-\infty$, not $x_1$ which has already been integrated out. – Xi'an Aug 03 '15 at 06:21