Law of the norm of the empirical mean of uniforms on the sphere?

Question

Let $U_1,\dots,U_n$ be i.i.d, uniform on the euclidean sphere on $\mathbb{R}^d$ that we denote $S^{d-1}$. I am searching for the law of $$M=\left\|\frac{1}{n}\sum_{i=1}^n U_i \right\|.$$

Attempts:

I don't know if it is possible to determine the law exactly, I need a lower bound on $M$ so I tried U-Statistics deviation bounds but the bound I obtained is not sufficient, I want to be able to say that with some small probability, $M$ is greater than $\sqrt{\lambda/n}$ for some $\lambda\le n$ (with U-stats I am limited to $\lambda \le 1$).

U-Stat approach: if we take the square, we obtain $$M=\left\|\frac{1}{n}\sum_{i=1}^n U_i \right\|^2 = \frac{1}{n} +\frac{1}{n^2}\sum_{i \neq j} \langle U_i, U_j\rangle .$$ For one individual term $\langle U_i, U_j\rangle$, we know what is the law (related to Distribution of scalar products of two random unit vectors in $D$ dimensions and https://math.stackexchange.com/questions/2977867/x-coordinate-distribution-on-the-n-sphere) but these terms are not independent hence taking the sum is not easy.
Another possibility is to write that $$M=\left\|\frac{1}{n}\sum_{i=1}^n U_i \right\|^2 = \frac{1}{n^2} +\frac{2}{n^2}\langle U_1, \sum_{i=2}^n U_i\rangle + \left\|\frac{1}{n}\sum_{i=2}^n U_i \right\|^2 .$$ then, by independence of the $U_i's$, we have $$\mathbb{P}\left(\left\|\frac{1}{n}\sum_{i=1}^n U_i \right\|^2 \le x\right)=\mathbb{E}\left[ \mathbb{P}\left(\frac{1}{n^2} +\frac{2}{n^2}\langle U_1, \sum_{i=2}^n U_i\rangle + \left\|\frac{1}{n}\sum_{i=2}^n U_i \right\|^2 \le x\big| U_2,\dots, U_n\right)\right] $$ and we know that the law of $(1+\langle U_1, \sum_{i=2}^n U_i\rangle)/2$ is a $Beta(d/2, d/2)$ distribution (conditionally on $U_2,\dots,U_n$) hence, $$\mathbb{P}\left(\left\|\frac{1}{n}\sum_{i=1}^n U_i \right\|^2 \le x\right)=\mathbb{E}\left[ F_{Beta(d/2,d/2)}\left( \frac{n^2x}{4}+\frac{1}{4}- \frac{1}{4}\left\|\sum_{i=2}^n U_i\right\|^2 \right)\right] $$ and this, I want to find an upper bound but I don't think I will succeed with this line of thought.

Sextus Empiricus · Answer 1 · 2020-12-12T11:21:21.973

1

Similarity with a rubber band model.

This problem got me to think of the model for a 'rubber band' (See for instance wikipedia or section 3-7 in Herbert B. Callen's thermodynamics and an introduction to thermostatistics).

With a bit of hand waving:

consider the distribution of only one axis/component of the $U_i$ (the marginal distribution along a single axis)
approximate this with a normal distribution (the approximation becomes more accurate for larger $n$ and also for larger $d$)
because the distribution needs to be spherically symmetric we assume the other components to be identical and independently distributed.
The squared length will be the sum of $d$ squared normal distributed variables.

Then the distribution of $M$ is approximately a scaled $\chi$ distributed variable with $d$ degrees of freedom. The scaling factor is $1/\sqrt{dn}$

To know the scaling we need to know the variance of the distribution of the component. The contributions of each $U_{i}$ follow some sort of beta distribution (it is only the projection of $U_{i}$ that matters)

$$f(x) \propto (1-x^2)^\frac{d-3}{2} \quad \text{for $-1 \leq x \leq 1$}$$

or with $t = x^2$

$$f(t) \propto t^\frac{1}{2}(1-t)^\frac{d-3}{2}\quad \text{for $0 \leq t \leq 1$}$$

This means that the mean of $T$ or the variance of $X$ is equal to $1/d$ (the mean of a beta distribution with $\alpha = 1/2$ and $\beta = (d-1)/2$). For the mean of $n$ of those variables, you get that the variance is $1/(dn)$.

Simulation

Below is an example for the case of $n=20$ and $d=4$

This is computed with the following r-code:

n = 20
d = 4

### sample from sphere
simsphere <- function(d) {
  x <- rnorm(d)
  x <- x/sqrt(sum(x^2))
  return(x)
}

### add 'n' times U and compute absolute value
getM <- function(n,d) {
  xv <- replicate(n,simsphere(d))
  vector <- rowSums(xv)
  norm <- sum(vector^2)^0.5
  return(norm/n)
}
  
### chi distriution
dchi <- function(x,nu) {
  x^{nu-1}*exp(-1/2*x^2)/(2^((nu/2)-1)*gamma(nu/2))
}
dchi <- Vectorize(dchi)

### simulate and plot histogram
M <- replicate(10^4, getM(n,d))
hist(M, breaks = seq(0,5,0.01), freq = 0, xlim = c(0,0.5))

### add approximation based on chi distribution
v = 1/d/n
ms <- seq(0,10,0.01)
lines(ms,dchi(ms/sqrt(v),d)/sqrt(v), col = 1, lwd = 2)

edited Dec 12 '20 at 11:21

answered Dec 11 '20 at 16:39

Sextus Empiricus

43,080
1
72
161

This is nice but the hand waving part is a little problematic. Indeed, I don't know how your approximation would far in high dimension and this is what I am interested about. Maybe with some kind of Berry Esseen type bound the approximation error could be computed ? As it is I can't be sure that the approximation error will not get worse and worse when the dimension increases. – TMat Dec 12 '20 at 07:47
@TMat what sort of application requires you to have a rigourous bound for the error of an estimate? – Sextus Empiricus Dec 12 '20 at 09:26
Two things are certain **1** The distribution needs to be spherical symmetric and some function solely of the distance **2** The marginal distribution needs to be distributed as the mean of $n$ beta distribution (generalized by flipping them in the y axis to get a distribution from -1 to 1). – Sextus Empiricus Dec 12 '20 at 10:00
This marginal distribution is gonna be normal distributed in the limit, which makes it easy to find the spherical symmetric function (it needs to be a joint normal distribution). If you do not take the limit then I suspect that you get some piecewise defined function as with the mean of a sample from a [regular beta distribution](https://stats.stackexchange.com/a/390879/)... – Sextus Empiricus Dec 12 '20 at 10:06
... The geometric image that I have now is that the marginal distribution of $f(x_1,x_2,\dots,x_n) \propto f(r)/r^{d-1}$ on a coordinate $x_1$ is found by integrating over the other coordinates which will be a $d-1$ spherical surface of radius $\sqrt{r^2-x^2}$ so the relation between the distribution of the distance $r$ (which you called $M$) and the distribution of a single coordinate, is relatively straightforward. – Sextus Empiricus Dec 12 '20 at 10:13
I will see if I can add an example of this idea/logic (the simple relationship between the distribution of the radius and the marginal distribution) to the answer. – Sextus Empiricus Dec 12 '20 at 10:13
You say the limit this is normal... but what limit ? When n goes to infinity ? When d goes to infinity ? If d is much larger than n, I wonder if there is not a problem with your approximation. There is nothing that say that your approximation is good in high dimension, your argument only say that it is good for n large enough. – TMat Dec 12 '20 at 11:01
@TMat I meant the limit for $n\to \infty$ but also for larger $d$ the distribution will be closer to a normal distribution. – Sextus Empiricus Dec 12 '20 at 11:16
how do you know that the distribution will be close to a normal if d is large ? The coordinates of U are not independent, this is not an application of the TLC, this can get rather complicated. – TMat Dec 12 '20 at 11:21
I tried your R code with n=2 and d=100 and the approximation was not good at all for instance. – TMat Dec 12 '20 at 13:45

score 1 · Answer 2 · answered Dec 11 '20 at 17:52

Densities of short uniform walks in higher dimensions could be relevant. It discusses random walks with $n$ steps each of length $1$ in $\mathbb{R}^d$, where each step is taken in a uniformly random direction.

Theorem 2.1 states that the probability density function of the distance to the origin in $d \ge 2$ dimensions after $n \ge 2$ steps is, for $x \gt 0$,

$$p_n(\nu; x) = \frac{2^{-\nu}}{\nu!}\int_0^\infty (tx)^{\nu+1}J_v(tx)j_\nu^n(t)\:dt$$

Where:

$\nu = \frac{d}{2}-1$
$J_\nu$ means the Bessel function of the first kind
$j_\nu(x) = \nu!(\frac{2}{x})^\nu J_\nu(x)$ is the "normalized Bessel function of the first kind"

The norm of the mean of the steps is just the final distance from $0$ divided by $n$. So you could theoretically plug $x = ny$ into the formula to get the value of the pdf you care about at $y$. But looks like it could be computationally difficult. (Theorem 2.10 apparently gives a "computationally more accessible" expression for the pdf.)

score 0 · Answer 3 · answered Dec 15 '20 at 15:40

I'm not sure if you want an upper or lower bound. You mention both in the question. The easiest way to get a very loose upper bound on this problem is to use Markov's inequality. Just in case it's been overlooked.

$$M=\left\|\frac{1}{n}\sum_{i=1}^n U_i \right\|.$$

$$P\left(M \ge \sqrt{\frac{\lambda}{n}}\right) \le \sqrt{\frac{n}{\lambda}}E[M]$$ by Markov's inequality.

Then by Jensen's inequality, $$E[M] < \sqrt{E[M^2]}.$$

So, the final bound is $$P\left(M \ge \sqrt{\frac{\lambda}{n}}\right) \le \frac{\sqrt{\sum_i \sum_j E\left[ \langle U_i,U_j \rangle \right]}}{\sqrt{\lambda \ n}}.$$

Finding the expectation of the inner product of two independent $U_i$ and $U_j$ shouldn't be too hard. Again, this will be a very loose bound, but it's better than no bound (sometimes).

Yes I already looked into this. The U_i are bounded so we can expect a much better concentration than what you exhibit. And also I was more interested in lower bound. — TMat, Dec 15 '20 at 17:38
I thought you might have. I just wanted to put it out there for completeness's sake. — MentatOfDune, Dec 15 '20 at 17:43

Law of the norm of the empirical mean of uniforms on the sphere?

3 Answers3

Similarity with a rubber band model.

Simulation

Linked