4

We have a bivariate normal process where $X, Y \sim N(0, \sigma)$, with no covariance.

(For convenience we can assert that $\sigma = 1$, or that we have a good estimate for its value.)

What is the distribution of the random variable $R(n) = \sqrt{\overline{x_i}^2 + \overline{y_i}^2}$ — i.e., the Euclidean distance between the sample center of a n points and the true center (at the origin)?

Note that as defined:

  1. $R(n) \ge 0$
  2. $E[R(n)] \to 0$ monotonically as $n \to \infty$
  3. The Rayleigh distribution gives us $R(1) = \sigma \sqrt{\pi/2} \approx \sigma 1.25$

Furthermore, based on a Monte Carlo simulation for $n \in [2, 25]$ with $\sigma = 1$:

  1. Variance decreases monotonically as n increases
  2. Skewness appears constant across n at 0.63
  3. Kurtosis appears constant across n at about 0.24

(This question is a simplified version of this slightly more complicated one that seems to have gotten derailed in complications.)

feetwet
  • 703
  • 1
  • 7
  • 24

1 Answers1

6

The sample means are zero mean normal r.v.'s ,

$$\bar X \sim N(0,\sigma^2/n),\;\; \bar Y \sim N(0,\sigma^2/n)$$

Then we have the r.v.'s

$$Z_x = \left(\frac{\bar X}{\sigma/\sqrt n}\right)^2 = \frac n{\sigma^2}\bar X^2 \sim \chi^2(1),\;\;Z_y = \left(\frac{\bar Y}{\sigma/\sqrt n}\right)^2 =\frac n{\sigma^2}\bar Y^2\sim \chi^2(1),$$

Therefore, the r.v. $$W = Z_x + Z_y =\frac n{\sigma^2}\left(\bar X^2+\bar Y^2\right)\sim \chi^2(2)$$

By the properties of a chi-square random variable, we have

$$W_n=\frac {\sigma^2}nW \sim \text {Gamma}(k=1, \theta = 2\sigma^2/n) = \text{Exp}(2\sigma^2/n)$$ i.e.

$$f_{W_n}(w_n) = \frac {n}{2\sigma^2}\cdot \exp\Big \{-\frac {n}{2\sigma^2} w_n\Big\}$$

Define $R_n = \sqrt {W_n}$. By the change-of-variable formula we have

$$W_n = R_n^2 \Rightarrow \frac {dW_n}{dR_n} = 2R_n$$ and so

$$f_{R_n}(r_n) = 2r_n\frac {n}{2\sigma^2}\cdot \exp\Big \{-\frac {n}{2\sigma^2} r_n^2\Big\} = \frac {r_n}{\alpha^2} \exp\Big \{-\frac {r_n^2}{2\alpha^2} \Big\},\;\;\alpha \equiv \sigma/\sqrt n$$

So $R_n$ also follows a Rayleigh distribution with parameter $\alpha$. We have

$$\begin{align} &E(R_n) = (\sigma/\sqrt n)\sqrt {\pi/2} \Rightarrow \lim_{n\rightarrow \infty} E(R_n) =0\\ &\operatorname {Var}(R_n) = \frac {4-\pi}{2}(\sigma/\sqrt n)^2\Rightarrow \lim_{n\rightarrow \infty} \operatorname {Var}(R_n) =0\\ &\text{skewness} =\frac{2\sqrt{\pi}(\pi - 3)}{(4-\pi)^{3/2}} =0.6311\\ &\text {kurtosis} =-\frac{6\pi^2 - 24\pi +16}{(4-\pi)^2} =0.2450\\ \end{align}$$

as the Monte Carlo simulation has given. Skewness and Kurtosis do not indeed depend on the parameter.

Alecos Papadopoulos
  • 52,923
  • 5
  • 131
  • 241
  • Interesting: I should have guessed that this was Rayleigh distributed when I saw constant skewness and kurtosis with those values. I wonder how many distributions can be identified via constant moments? – feetwet May 02 '14 at 23:38
  • 2
    The Logistic and Exponential distributions are two distributions that have higher moments independent of their parameters. – Alecos Papadopoulos May 03 '14 at 00:17
  • 2
    Generalizations for correlated and biased cases are described in http://stats.stackexchange.com/a/185204/22077 – Felipe G. Nievinski Dec 05 '15 at 18:50
  • @AlecosPapadopoulos - since you nailed this, I would be tremendously grateful if you could take a look at [this related question](https://stats.stackexchange.com/q/278564/34792). – feetwet May 26 '17 at 14:56