3

Four random numbers are drawn at random from a standard normal distribution. They are grouped in two pairs of closest numbers, $\{x_1, x_2\}$ and $\{x_3, x_4\}$ so that $x_1\le x_2 \le x_3 \le x_4$, and the average of each pair is computed. What are the expected averages of each pair? I found papers dealing with the averages of 2 out 3 observations (e.g., here), but I wasn't able to find anything for this specific problem.

whuber
  • 281,159
  • 54
  • 637
  • 1,101
  • 1
    It looks like you might have overcomplicated your question. Because the $x_i$ are ordered, the symmetry of the standard Normal distribution implies $(x_1,x_2)$ and $(-x_4,-x_3)$ are identically distributed. Therefore you only have to answer the question for the two smallest out of four. Moreover, the expected average is the average of the expectations of $x_1$ and $x_2$ separately. These expectations can be found in the same manner as for any order statistic in a sample of $n.$ For $n\le 5$ (as I recall) there may be exact formulas; for larger $n,$ numerical integration is required. – whuber Jan 10 '22 at 16:17
  • 1
    Further to @whuber's comment, [Harter (1961)](https://doi.org/10.2307/2333139) gives an explicit formula (use $n=4$ and $p=1$) involving an improper integral. There is likely something newer out there. – Stephan Kolassa Jan 10 '22 at 16:44

1 Answers1

1

Let $\Phi$ be the standard Normal distribution and

$$\phi(z) = \frac{\mathrm{d}}{\mathrm{d}z}\Phi(z) = \frac{1}{\sqrt{2\pi}}e^{-z^2/2}$$

be its density function. The the largest of a sample $(x_1,x_2,x_3,x_4)$ from this distribution, often written $x_{4;4},$ has distribution function

$$\Pr(x_{4;4}\le z) = \Pr(x_1\le z;x_2\le z; x_3\le z;x_4\le z) = \Phi(z)^4$$

(the latter arising from the independence of the $x_i$) and therefore has density

$$\phi_{4;4}(z) = \frac{\mathrm{d}}{\mathrm{d}z} \Phi(z)^4 = 4\Phi(z)^{3}\phi(z).$$

Its expectation can be simplified a little via integration by parts upon exploiting the basic relation $\phi^\prime(z) = -z\phi(z),$ whence

$$\int z\,\Phi(z)^{3}\phi(z)\mathrm{d}z = -\phi(z)\Phi^{3}(z)\bigg|_{-\infty}^\infty + \int 3\Phi(z)^{2} \phi(z)^2\mathrm{d}z.$$

At this juncture we need a "trick"--a new idea. One that works is to introduce a new variable $a$ and analyze the function

$$g_2(a) = \int \Phi(a z)^2 \phi(z)\,\mathrm{d}z.$$

This is differentiable, with a derivative we may compute mechanically as

$$\begin{aligned} \frac{\mathrm{d}}{\mathrm{d}z} g_2(a) &= \int \frac{\mathrm{d}}{\mathrm{d}z} \Phi(a z)^2 \phi(z)\,\mathrm{d}z \\ &= 2\int z \phi(az) \Phi(a z) \phi(z)\,\mathrm{d}z \\ &= \frac{2}{2\pi}\int z \Phi(a z)\exp\left(-(z^2 + a^2z^2)/2\right) \,\mathrm{d}z \\ &= \frac{2}{2\pi(1+a^2)}\int z(1+a^2) \Phi(a z)\exp\left(-(z^2 + a^2z^2)/2\right) \,\mathrm{d}z \\ &= \frac{2}{2\pi(1+a^2)}\int \Phi(a z)\,\frac{\mathrm{d}}{\mathrm{d}z} \exp\left(-(z^2 + a^2z^2)/2\right) \,\mathrm{d}z \end{aligned}$$

This last expression is a lovely candidate for an integration by parts: because the exponential decreases rapidly for large $|z|,$ the resulting integral is

$$\begin{aligned} \frac{\mathrm{d}}{\mathrm{d}z} g_2(a) &= \frac{2}{2\pi(1+a^2)}\int \frac{\mathrm{d}}{\mathrm{d}z} \Phi(a z)\,\exp\left(-(z^2 + a^2z^2)/2\right) \,\mathrm{d}z \\ &= \frac{2a}{2\pi(1+a^2)\sqrt{2\pi}}\int \exp(-a^2z^2/2)\,\exp\left(-(z^2 + a^2z^2)/2\right) \,\mathrm{d}z \\ &= \frac{2a}{2\pi(1+a^2)\sqrt{1+2a^2}}. \end{aligned}$$

The last evaluation arises by noting the integrand is a multiple of a Normal density function (with variance $1+2a^2$).

$g_2(a)$ can be recovered by integrating this expression with respect to $a.$ It can be computed via elementary means (substitutions), yielding

$$g_2(a) = C + \frac{1}{\pi} \tan^{-1}\sqrt{1 + 2a^2}$$

and, since $g_2(0) = \Phi(0)^2\int \phi(z)\mathrm{d}z = 1/4,$ the constant of integration is $C=0.$

This applies to our previous analysis to produce

$$E[x_{4;4}] = \int z \phi_{4;4}(z)\,\mathrm{d}z = 4 \int 3\Phi(z)^2\phi(z)^2\,\mathrm{d}z = \frac{12}{{2\pi}} \int \Phi(z)^2 e^{-z^2}\,\mathrm{d}z.$$

Substituting $z = u/\sqrt{2}$ puts this into the form

$$E[x_{4;4}] = \frac{12}{\sqrt{2\pi}}\frac{1}{\sqrt 2}\, g_2\left(\frac{1}{\sqrt 2}\right) = \frac{6}{\pi \sqrt{\pi}}\tan^{-1}\sqrt{2} \approx 1.029375\ldots$$

This is the only integral we need compute, because the distribution function of the second-highest sample value is

$$\Pr(x_{4;3} \le z) = 12 \Phi(z)^2(1-\Phi(z))\phi(z),$$

giving

$$E[x_{4;3}] = 12 \int z\left[\Phi(z)^2(1-\Phi(z))\phi(z)\right]\,\mathrm{d}z = 4E[x_{3;3}] - 3E[x_{4;4}].$$

To find the expectation of $x_{3;3},$ integrate by parts as before to produce an integral in the form of How can I calculate $\int^{\infty}_{-\infty}\Phi\left(\frac{w-a}{b}\right)\phi(w)\,\mathrm dw$. The result is $3/(2\sqrt{\pi}).$ Thus

$$E[x_{4;3}] = 4\frac{3}{2\sqrt\pi} - 3\frac{6}{\pi \sqrt{\pi}}\tan^{-1}\sqrt{2} \approx 0.29701138\ldots$$

Finally, the question asks for the expectations

$$-E[(x_{4;1} + x_{4;2})/2] = E[(x_{4;4} + x_{4;3})/2] = \frac{3}{\pi\sqrt\pi} \tan^{-1}\sqrt{2} - \frac{3}{2\sqrt \pi} \approx 0.69424101\ldots$$

These are negatives of each other because the standard Normal distribution has a mean of zero, which implies

$$\begin{aligned} 0 &= (E[x_1] + E[x_2])/2 + (E[X_3] + E[x_4])/2 \\ &= E[x_1+\cdots + x_4]/2 \\&= E[x_{4;1} + \cdots x_{4;4}]/2 \\&= E[(x_{4;1}+x_{4;2})/2] + E[(x_{4;3}+x_{4;4})/2].\end{aligned}$$

Reference

Similar techniques can be applied to find positive integral moments of order statistics from random Normal samples of sizes $n=1,2,3,4,5.$ For larger sample sizes, numerical integration appears to be required.

Balakrishnan, N. and A. Clifford Cohen, Order Statistics and Inference. Academic Press, 1991: Section 3.9.

whuber
  • 281,159
  • 54
  • 637
  • 1,101