Let $X_{(1)}\leq X_{(2)}$ be the order statistics. Evaluate $\operatorname{Var}(X_{(j)})$, $\operatorname{Cov}(X_{(1)},X_{(2)})$

Question

Let $X_{(1)}\leq X_{(2)}$ be the order statistics for a random sample of size $2$ from a normal distribution with mean $\mu$ and variance $\sigma ^{2}$.

Evaluate $\operatorname{E}(X_{(1)})$, $\operatorname{E}(X_{(2)})$, $\operatorname{Var}(X_{(1)})$, $\operatorname{Var}(X_{(2)})$ and $\operatorname{Cov}(X_{(1)},X_{(2)})$.

My attempt: In general, for a random sample of size $2$ with distribution function $F$ and density function $f$ I know that the joint density function of $X_{(j)}$ is given by $$f_{X_{(j)}}(t)=\frac{n!}{(j-1)!(n-j)!}\left[F(t)\right]^{j-1}\left[1-F(t)\right]^{n-j}f(t) \qquad -\infty<t<\infty .$$ In particular, after several calculations, in our case we have

$$f_{X_{(j)}}(t)=\left\{\begin{array}{ll}\frac{1}{\sigma \sqrt{2\pi}}\left[1-\mathrm{erf}\left(\frac{t-\mu}{\sigma \sqrt{2}}\right)\right]e^{-\left(\frac{t-\mu}{\sigma \sqrt{2}}\right)^{2}} & \mbox{If }j=1 \\ \frac{1}{\sigma \sqrt{2\pi}}\left[1+\mathrm{erf}\left(\frac{t-\mu}{\sigma \sqrt{2}}\right)\right]e^{-\left(\frac{t-\mu}{\sigma \sqrt{2}}\right)^{2}} & \mbox{If }j=2 \end{array}\right. .$$ for $-\infty<t<\infty$.

Therefore, the expectation is

$$E(X_{(j)})=\left\{\begin{array}{ll}\frac{1}{\sigma \sqrt{2\pi}}{\displaystyle \int_{-\infty}^{\infty} t\left[1-\mathrm{erf}\left(\frac{t-\mu}{\sigma \sqrt{2}}\right)\right]e^{-\left(\frac{t-\mu}{\sigma \sqrt{2}}\right)^{2}}dt} & \mbox{If }j=1 \\ \frac{1}{\sigma \sqrt{2\pi}}{\displaystyle \int_{-\infty}^{\infty} t\left[1+\mathrm{erf}\left(\frac{t-\mu}{\sigma \sqrt{2}}\right)\right]e^{-\left(\frac{t-\mu}{\sigma \sqrt{2}}\right)^{2}}dt} & \mbox{If }j=2 \end{array}\right. .$$

The problems begin when I want to calculate $\operatorname{Var}(X_{(1)})$, $\operatorname{Var}(X_{(2)})$ and $\operatorname{Cov}(X_{(1)},X_{(2)})$ because I do not know the density function of the random variables of $X_{(j)}^{2}$ for $j=1,2$ and $X_{(1)}X_{(2)}$, I have not could calculate these densities, that is basically what I need, although I do not know if there is another way of making all this without having to calculate these densities.

Welcome to stats.SE! Please take a moment to view our [tour]. Your question reads as if it were a homework problem. If that is the case, please read our [wiki](http://stats.stackexchange.com/tags/self-study/info) related to self-study and add the [tag:self-study] tag to your question. — Tavrock, Feb 26 '17 at 21:22
I think this is a step in the right direction http://stats.stackexchange.com/questions/61080/how-can-i-calculate-int-infty-infty-phi-left-fracw-ab-right-phiw?lq=1 but it would require some work to apply the results here. — Sycorax, Feb 26 '17 at 22:14

score 6 · Accepted Answer · edited Feb 28 '17 at 12:27

When two variables $(X_1,X_2)$ are identically distributed with a continuous distribution having density $f$, the joint PDF of their order statistics $(X_{(1)}, X_{(2)})$ is

$$2 f(x_1) f(x_2) \mathcal{I}(x_2 \gt x_1).\tag{1}$$

We know how moments depend on location parameters $\mu$ and scale parameters $\sigma$, so it suffices to solve the problem for $\mu=0$ and $\sigma=1$.

These figures illustrate the following analysis. At left is a contour plot of the joint density of $(X_1,X_2)$. In the middle is a contour plot of the joint density of the order statistics $(1)$ (it is identical in appearance to the left plot but restricted to the region $x_{(2)}\ge x_{(1)}$; all contour values have been doubled, too), along with vectors depicting the new variables $(U,V)$. At the right is the joint density in $(u,v)$ coordinates, along with vectors depicting the order statistics $(X_{(1)}, X_{(2)})$. Computing the moments in $(u,v)$ coordinates is easy. Simple formulas connect these moments to the moments of the original order statistics.

Suppose $f$ is symmetric (as are all Normal distributions). Since $X_1 + X_2 = X_{(1)} + X_{(2)}$ and $(-X_{(2)}, -X_{(1)})$ has the same distribution, $$-\mathbb{E}(X_{(1)}) = \mathbb{E}(X_{(2)}) = \nu,$$ say, and obviously $$\operatorname{Var}(X_{(1)})= \operatorname{Var}(X_{(2)}) = \tau^2,$$ say.

At this point let's exploit some special properties of Normal distributions. Upon rotating $(X_{(1)}, X_{(2)})$ clockwise by $\pi/4$ to $U=(X_{(1)}+X_{(2)})/\sqrt{2}$ and $V=(X_{(2)}-X_{(1)})/\sqrt{2}$, this becomes the density of a bivariate standard Normal variable $(U,V)$ that has been truncated to the domain $V \gt 0$. It is immediate that $U$ has a standard Normal distribution and $V$ has a half-Normal distribution. Consequently

$$\mathbb{E}(U)=0, \ \mathbb{E}(V) = \sqrt{\frac{1}{\pi}},\ \operatorname{Var}(U)=1,\ \text{and}\ \operatorname{Var}(V) = 1 - \mathbb{E}(V)^2 = 1 - \frac{1}{\pi}.$$

Relating these to the original variables gives

$$\cases{ 1 = \operatorname{Var}(U) = \operatorname{Var}\left(\frac{1}{\sqrt{2}}\left(X_1+X_2\right)\right) = \frac{1}{2}\left(\tau^2 + \tau^2+2\operatorname{Cov}(X_1,X_2)\right) \\ 1 - \frac{1}{\pi} = \operatorname{Var}(U) = \cdots = \frac{1}{2}\left(\tau^2 + \tau^2-2\operatorname{Cov}(X_{(1)},X_{(2)})\right). }$$

The solution to these simultaneous linear equations is

$$\tau^2 = 1 - \frac{1}{\pi},\ \operatorname{Cov}(X_{(1)},X_{(2)}) = \frac{1}{2\pi}.$$

In the same manner, expressing the expectations of $U$ and $V$ in terms of those of $X_{(1)}$ and $X_{(2)}$ gives equations for $\nu$ whose solution is $\nu = \sqrt{1/\pi}$.

Returning to the original question, where the variables are scaled by $\sigma$ and shifted by $\mu$, the answers must therefore be

$$\mathbb{E}(X_{(i)}) = \mu + (-1)^i \sigma \sqrt{\frac{1}{\pi}}$$

and

$$\operatorname{Var}\left(X_{(1)}, X_{(2)}\right) = \sigma^2\pmatrix{1-\frac{1}{\pi} & \frac{1}{\pi} \\ \frac{1}{\pi} & 1 - \frac{1}{\pi}}.$$

Does the last expression refer to "covariance" or "variance"? In either case, why is it not a number? Why is it a matrix? — Diego Fonseca, Feb 26 '17 at 23:41
The variance of a vector-valued random variable is the full "variance-covariance" matrix. It contains all variances and covariances of the components of the variable. — whuber, Feb 26 '17 at 23:53
W Huber, $E(X_{(1)}) = \mu - \sigma \sqrt{1/\pi}$, not $-E(X_{(2)}) = -\mu - \sigma \sqrt{1/\pi}$ — GoF_Logistic, Feb 27 '17 at 15:24
@whuber In don't understand because you say $\operatorname{Var}(X_{(1)})= \operatorname{Var}(X_{(2)})$. Assuming $\mu=0$ note that $$E(X_{(1)}^{2})=2\sigma^{2}-E(X_{(2)}^{2})$$. Therefore $$\operatorname{Var}(X_{(1)})=E(X_{(1)}^2)-\left[E(X_{(1)})\right]^{2}=2 \sigma ^{2} - E(X_{(2)}^{2})-\left[E(X_{(2)})\right]^{2}.$$ I do not see how from this last expression can be related both variances. — Diego Fonseca, Feb 27 '17 at 16:07
As I remarked, the distributions of the two order statistics are negatives of each other. Therefore their variances must be the same. — whuber, Feb 27 '17 at 17:42
@GoF Thank you for pointing that out: I was trying to be too economical of notation. The statement is now fixed. — whuber, Feb 27 '17 at 17:44
Made minor improvement in penultimate displayed equation about $E[X_{(i)}]$. Please roll back if you disapprove. — Dilip Sarwate, Feb 28 '17 at 12:29

Dilip Sarwate · Answer 2 · 2017-02-28T12:25:12.557

Here is a brute-force answer that lacks the elegance of whuber's calculations but arrives at the same conclusions.

With $X_i, i = 1, 2,$ denoting independent standard random variables and $$(W,Z) = \left(\min(X_1,X_2),\max(X_1,X_2)\right) = \left (X_{(1)},X_{(2)}\right),$$ we have that $f_{X_1,X_2}(x,y)= \phi(x)\phi(y)$ and $f_{W,Z}(w,z)= \displaystyle \begin{cases}\displaystyle 2\phi(x)\phi(y), & z>w,\\ \ \\ 0, & z<w, \end{cases}$

where $\phi(\cdot)$ denotes the standard normal density function. Now, \begin{align} E[W] &= \int_{-\infty}^\infty \int_{-\infty}^\infty w\cdot f_{W,Z}(w,z)\, \mathrm dz\, \mathrm dw\\ &= \int_0^\infty \int_{\pi/4}^{5\pi/4} r\cos(\theta)\cdot \frac{1}{\pi} \exp\left(\frac{-r^2}{2}\right)\, r\,\mathrm d\theta \, \mathrm dr &\scriptstyle{\text{change to polar coordinates}}\\ &= \left. \int_0^\infty \sin(\theta)\right|_{\pi/4}^{5\pi/4} \cdot \frac{1}{\pi} r^2 \exp\left(\frac{-r^2}{2}\right) \, \mathrm dr\\ &= -\frac{\sqrt 2}{\pi}\int_0^\infty r^2 \exp\left(\frac{-r^2}{2}\right) \, \mathrm dr &\scriptstyle{\text{now re-write the constant}}\\ &= -\frac{1}{\sqrt \pi}\int_{-\infty}^\infty r^2 \frac{1}{\sqrt{2\pi}}\exp\left(\frac{-r^2}{2}\right) \, \mathrm dr &\scriptstyle{\text{and recognize the integral}}\\ &= -\frac{1}{\sqrt \pi}, \end{align} and since $W+Z = X_{(1)}+X_{(2)} = X_1+X_2$, we deduce that $$E[Z] = E[X_{1}+X_{2}]-E[W] = 0 - \left(-\frac{1}{\sqrt \pi}\right) = \frac{1}{\sqrt \pi}.$$ Similarly, \begin{align} E[W^2] &= \int_{-\infty}^\infty \int_{-\infty}^\infty w^2\cdot f_{W,Z}(w,z)\, \mathrm dz\, \mathrm dw\\ &= \int_0^\infty \int_{\pi/4}^{5\pi/4} r^2\cos^2(\theta)\cdot \frac{1}{\pi} \exp\left(\frac{-r^2}{2}\right)\, r\,\mathrm d\theta \, \mathrm dr &\scriptstyle{\text{change to polar coordinates}}\\ &= \left. \int_0^\infty \frac{2\theta+\sin(2\theta)}{4}\right|_{\pi/4}^{5\pi/4} \cdot \frac{1}{\pi} r^3 \exp\left(\frac{-r^2}{2}\right) \, \mathrm dr\\ &= \frac{1}{2}\int_0^\infty r^3 \exp\left(\frac{-r^2}{2}\right) \, \mathrm dr &\scriptstyle{\text{now set }r^2/2 = t}\\ &= \int_0^\infty t \exp\left(-t\right) \, \mathrm dt \\ &= 1, \end{align} and since $W^2+Z^2 = X_{(1)}^2+X_{(2)}^2 = X_1^2+X_2^2$, we have that $$E[W^2+Z^2]= 1+E[Z^2]=E[X_1^2+X_2^2]=2 \implies E[Z^2] = E[W^2]=1.$$ It follows that $\operatorname{var}(W) = \operatorname{var}(Z) = 1-\frac{1}{\pi}.$

Finally, \begin{align} \operatorname{cov}(X_{(1)},X_{(2)})&= \operatorname{cov}(W,Z)\\ &= E[WZ] - E[W]E[Z]\\ &= E[X_1X_2] + \frac{1}{\pi}\\ &= E[X_1]E[X_2] + \frac{1}{\pi}&\scriptstyle{\text{because }X_1~\text{and }X_2~\text{are independent}}\\ &= \frac{1}{\pi} \end{align}

If $X_1$ and $X_2$ are scaled by $\sigma$ and translated by $\mu$ to iid $N(\mu,\sigma^2)$, then we ready get that $$E[X_{(1)}] = \mu - \frac{\sigma}{\sqrt{\pi}}, \quad E[X_{(2)}] = \mu + \frac{\sigma}{\sqrt{\pi}}\\ \operatorname{var}(X_{(1)}) = \operatorname{var}(X_{(2)}) = \sigma^2\left(1-\frac{1}{\pi}\right)\\ \operatorname{cov}(X_{(1)},X_{(2)}) = \frac{\sigma^2}{\pi}.$$

Let $X_{(1)}\leq X_{(2)}$ be the order statistics. Evaluate $\operatorname{Var}(X_{(j)})$, $\operatorname{Cov}(X_{(1)},X_{(2)})$

2 Answers2