What is the density of the $m$'th element of a sorted vector of $n$ uniformly distributed random variables

Question

$X_1, X_2, ..., X_n$ are independent and uniformly distributed on $[0, 1]$. Sorting them yields a vector, whose first and last element have densities that are just the derivatives of products of CDFs.

As regards the second element i tried to use inclusion/exclusion, but the coefficients are only clear to me when $n \leq 3$. How to find the densities for large some $n$ and all $m$?

I've derived it before and I know it follows something like a beta distribution. I've actually written a silly R-package that allows you to take a sorted sample form a discrete distribution in linear time. Let me dig it up for you... — Cliff AB, Feb 07 '16 at 18:40
Possible duplicate of [Order statistic for beta distribution](http://stats.stackexchange.com/questions/156712/order-statistic-for-beta-distribution) — Xi'an, Feb 07 '16 at 20:20

Cliff AB · Accepted Answer · 2017-11-04T15:07:09.217

Start with $n$ sorted uniform(0,1) distributions, denoted $u_1,..., u_n$ ($u_k$ is the $k^{th}$ order statistic). Or as I like to think of it for this problem, $n$ beta(1,1) distributions.

Now consider

$ \lim_{\epsilon \rightarrow 0} \frac{P(x - \epsilon < u_k < x + \epsilon) }{2\epsilon}$

which is, by definition, the pdf of $u_k$. To compute this probability for a given $\epsilon$, we note that there must $k-1$ independent uniform(0,1) variables less than $x - \epsilon$ and $n-k$ independent uniform(0,1) variables greater than $x+\epsilon$ and one uniform(0,1) variable that will fall in the interval $(x - \epsilon, x + \epsilon)$. For a given $\epsilon$, this gives us probability

$\frac{(x-\epsilon)^{k-1} (1 -x +\epsilon)^{n-k} (2\epsilon)}{2\epsilon} \frac{n!}{(k-1)!(n-k)!}$.

Taking the limit as $\epsilon \rightarrow 0$, we get

$x^{k-1} (1 -x)^{n-k} \frac{n!}{(k-1)!(n-k)!}$.

which is the pdf of a Beta($k, n-k+1$) distribution.

Interesting note: if you are interested in sampling from this, in general a Beta($a, b$) distribution is (relatively) slow to sample from due to an intractable integral. On the other hand, sampling from a Beta($1, b$) or a Beta($a, 1$) can be done in closed form. Turns out, sampling from $u_k | u_{k-1}$ will require sampling from a Beta($1, n - k+1$) distribution (and then linearly transforming), rather than a Beta($k, n$). As such, if $k$ is close to 1 or $n$, it is often much faster to sequentially sample $u_1$, $u_2|u_1$, ..., $u_k | u_{k-1}$ or likewise $u_n$, $u_{n-1} | u_n$, ..., $u_k | u_{k+1}$ then to directly sample $u_k$ from the given Beta($k, n$) distribution.

Can be found here: https://en.wikipedia.org/wiki/Beta_distribution — kjetil b halvorsen, Feb 07 '16 at 19:27
I don't recognize your expression as the CDF of any Beta distribution: it's proportional to the PDF of a Beta$(n-k+1,k+1)$ distribution--and that shows that it's incorrect. The CDF is proportional to an incomplete Beta function. — whuber, Nov 01 '17 at 15:38
@whuber: you are correct! It does turn out to be a [beta distribution](https://en.wikipedia.org/wiki/Order_statistic#Order_statistics_sampled_from_a_uniform_distribution), but I had written the wrong beta distribution. Now deriving it on my own is somewhat confusing me: taking the derivative of the expression I wrote above does not seem to lead to a beta pdf, so I will work it on more on my own and update again when I am done. — Cliff AB, Nov 03 '17 at 17:26
One derivation is at https://stats.stackexchange.com/a/4684/919. One place you have gone wrong is in providing an overly narrow characterization of the event $u_k\le x$: this includes all events where *at least* $k$ of the original data are less than or equal to $x$. You should not also require that the remaining $n-k$ data exceed $x$! — whuber, Nov 03 '17 at 18:19

score 2 · Answer 2 · answered Nov 03 '17 at 19:52

A non-rigorous derivation of the density function is given at https://stats.stackexchange.com/a/4684/919. It amounts to characterizing the density of $u_k$ near $x$ as the chance that $k-1$ of the $X_i$ are less than or equal to $x$, one of them is between $x$ and $x+dx$ (for infinitesimal $dx$), and the remainder exceed $x+dx$. This is a multinomial probability equal to

$$f_k(x) = \binom{n}{k-1;\, 1;\, n-k} x^{k-1}(dx)^{1}(1-x-dx)^{n-k}.$$

Ignoring terms of order $(dx)^2$ and--if you prefer--expanding the multinomial coefficient, this is

$$f_k(x) = \frac{n!}{(k-1)!(n-k)!} x^{k-1} (1-x)^{n-k} dx.\tag{1}$$

That's the PDF of a Beta$(k,n+1-k)$ variable. Notice that as $k=1,2,\ldots,n$, its parameters proceed $(1,n),(2,n-1), \ldots, (n,1)$: in particular, they never degenerate to $0$. (This is one quick check that the parameters are the correct ones and aren't one off due to some mistake of arithmetic.)

If you don't like infinitesimals, the same result can be derived beginning with the distribution function

$$F_k(x) = \Pr(u_k \le x).$$

In terms of the original $X_i$, the event $u_k \le x$ is that $k$ or more of the $X_i$ are less than or equal to $x.$ Since uniformity implies $\Pr(X_i\le x)=x$ and the $X_i$ are independent, this is a Binomial probability. We may immediately write it as

$$F_k(x) = \sum_{j=k}^n \binom{n}{j} x^j(1-x)^{n-j}.$$

Differentiating ought to yield the density--and it does, with some work. What is needed is to recognize that

$$\binom{n}{j+1}(j+1) = \frac{n!}{j!(n-j-1)!} = \binom{n}{j}(n-j).\tag{*}$$

Since $F_k$ is expressed as a finite sum, we may differentiate it term-by-term to produce

$$\eqalign{ f_k(x) & = \frac{d}{dx}F_k(x) = \sum_{j=k}^n \binom{n}{j} (jx^{j-1}(1-x)^{n-j}-(n-j)x^j(1-x)^{n-j-1})\\ &= \sum_{j=k-1}^{n-1} \binom{n}{j+1} (j+1)x^{j}(1-x)^{n-j-1}-\sum_{j=k}^n \binom{n}{j}(n-j)x^j(1-x)^{n-j-1}\\ &= \sum_{j=k-1}^{n-1} \binom{n}{j+1} (j+1)x^{j}(1-x)^{n-j-1}-\sum_{j=k}^n \binom{n}{j+1} (j+1)x^j(1-x)^{n-j-1}.\\ }$$

The relation $(*)$ was used in the last step. The preceding step was merely a relabeling of the index $j$ to match up common expressions $x^j(1-x)^{n-j-1}$. Every term but the first cancels out, leaving

$$f_k(x) = \binom{n}{k} (k)x^{k-1}(1-x)^{n-k} = \frac{n!}{(k-1)!(n-k)!} x^{k-1} (1-x)^{n-k}.$$

This is identical to the original expression $(1)$.

Very nice! When I went back and tried to differentiate the CDF, I looked at the sum of $n$ choose $j$'s and got very scared (after realizing my mistake about there being $k$ **or more** $X_i...$)! — Cliff AB, Nov 04 '17 at 15:20

What is the density of the $m$'th element of a sorted vector of $n$ uniformly distributed random variables

2 Answers2