38

If $\mathbf{x}$ and $\mathbf{y}$ are two independent random unit vectors in $\mathbb{R}^D$ (uniformly distributed on a unit sphere), what is the distribution of their scalar product (dot product) $\mathbf x \cdot \mathbf y$?

I guess as $D$ grows the distribution quickly (?) becomes normal with zero mean and variance decreasing in higher dimensions $$\lim_{D\to\infty}\sigma^2(D) \to 0,$$ but is there an explicit formula for $\sigma^2(D)$?

Update

I ran some quick simulations. First, generating 10000 pairs of random unit vectors for $D=1000$ it is easy to see that the distribution of their dot products is perfectly Gaussian (in fact it is quite Gaussian already for $D=100$), see the subplot on the left. Second, for each $D$ ranging from 1 to 10000 (with increasing steps) I generated 1000 pairs and computed the variance. Log-log plot is shown on the right, and it is clear that the formula is very well approximated by $1/D$. Note that for $D=1$ and $D=2$ this formula even gives exact results (but I am not sure what happens later).

dot products between random unit vectors

amoeba
  • 93,463
  • 28
  • 275
  • 317
  • @KarlOskar: thank you, this link is very relevant, and in fact renders my question *almost* a duplicate, but not quite. So there is an explicit formula for $P\{(\mathbf{x}, \mathbf{y})>\epsilon\}$ which is a cumulative distribution function of the dot products. One can take a derivative to get the PDF and then study the $D\to \infty$ limit. However, the formula is given in terms of beta functions and incomplete beta functions, so the calculations are likely to be nasty. – amoeba Feb 09 '14 at 11:23
  • @KarlOskar: from the uniform distribution on a unit sphere in $\mathbb{R}^D$. To generate a random vector from this distribution, one can generate a random vector from a Gaussian with a unit variance, and then normalize it. – amoeba Feb 09 '14 at 12:27

3 Answers3

41

Because (as is well-known) a uniform distribution on the unit sphere $S^{D-1}$ is obtained by normalizing a $D$-variate normal distribution and the dot product $t$ of normalized vectors is their correlation coefficient, the answers to the three questions are:

  1. $u= (t+1)/2$ has a Beta$((D-1)/2,(D-1)/2)$ distribution.

  2. The variance of $t$ equals $1/D$ (as speculated in the question).

  3. The standardized distribution of $t$ approaches normality at a rate of $O\left(\frac{1}{D}\right).$


Method

The exact distribution of the dot product of unit vectors is easily obtained geometrically, because this is the component of the second vector in the direction of the first. Since the second vector is independent of the first and is uniformly distributed on the unit sphere, its component in the first direction is distributed the same as any coordinate of the sphere. (Notice that the distribution of the first vector does not matter.)

Finding the Density

Letting that coordinate be the last, the density at $t \in [-1,1]$ is therefore proportional to the surface area lying at a height between $t$ and $t+dt$ on the unit sphere. That proportion occurs within a belt of height $dt$ and radius $\sqrt{1-t^2},$ which is essentially a conical frustum constructed out of an $S^{D-2}$ of radius $\sqrt{1-t^2},$ of height $dt$, and slope $1/\sqrt{1-t^2}$. Whence the probability is proportional to

$$\frac{\left(\sqrt{1 - t^2}\right)^{D-2}}{\sqrt{1 - t^2}}\,dt = (1 - t^2)^{(D-3)/2} dt.$$

Letting $u=(t+1)/2 \in [0,1]$ entails $t = 2u-1$. Substituting that into the preceding gives the probability element up to a normalizing constant:

$$f_D(u)du \; \propto \; (1 - (2u-1)^2)^{(D-3)/2} d(2u-1) = 2^{D-2}(u-u^2)^{(D-3)/2}du.$$

It is immediate that $u=(t+1)/2$ has a Beta$((D-1)/2, (D-1)/2)$ distribution, because (by definition) its density also is proportional to

$$u^{(D-1)/2-1}\left(1-u\right)^{(D-1)/2-1} = (u-u^2)^{(D-3)/2} \; \propto \; f_D(u).$$

Determining the Limiting Behavior

Information about the limiting behavior follows easily from this using elementary techniques: $f_D$ can be integrated to obtain the constant of proportionality $\frac{\Gamma \left(\frac{D}{2}\right)}{\sqrt{\pi } \Gamma \left(\frac{D-1}{2}\right)}$; $t^k f_D(t)$ can be integrated (using properties of Beta functions, for instance) to obtain moments, showing that the variance is $1/D$ and shrinks to $0$ (whence, by Chebyshev's Theorem, the probability is becoming concentrated near $t=0$); and the limiting distribution is then found by considering values of the density of the standardized distribution, proportional to $f_D(t/\sqrt{D}),$ for small values of $t$:

$$\eqalign{ \log(f_D(t/\sqrt{D})) &= C(D) + \frac{D-3}{2}\log\left(1 - \frac{t^2}{D}\right) \\ &=C(D) -\left(1/2 + \frac{3}{2D}\right)t^2 + O\left(\frac{t^4}{D}\right) \\ &\to C -\frac{1}{2}t^2 }$$

where the $C$'s represent (log) constants of integration. Evidently the rate at which this approaches normality (for which the log density equals $-\frac{1}{2}t^2$) is $O\left(\frac{1}{D}\right).$

Figure

This plot shows the densities of the dot product for $D=4, 6, 10$, as standardized to unit variance, and their limiting density. The values at $0$ increase with $D$ (from blue through red, gold, and then green for the standard normal density). The density for $D=1000$ would be indistinguishable from the normal density at this resolution.

whuber
  • 281,159
  • 54
  • 637
  • 1,101
  • 6
    (+1) Thank you very much, @whuber, this is a great answer! Special thanks for mentioning the word "frustum". It so happens that I have accepted another answer just minutes before you posted yours, and I wouldn't like to de-accept it now; hope you understand. Pity that it's not possible to accept both! By the way, note a very simple proof of $1/D$ expression for variance from that answer: one can see it directly without messing around with beta functions! Variance of the dot product is equal to variance of any sphere coordinate (as you wrote), and a sum of all $D$ of them should be $1$, Q.E.D. – amoeba Feb 09 '14 at 21:58
  • 2
    That's a nice observation about the variances. – whuber Feb 09 '14 at 22:30
  • @Amoeba Thank you very much for finding this and bringing it to my attention. You are correct; I thoroughly mixed up the roles of $u$ and $t$. I have gone through and straightened that out, adding some details of the calculation to check on the correctness. – whuber Mar 13 '15 at 19:15
  • Thanks for the edit! By the way, the reason I came back to this thread is [this today's question](http://stats.stackexchange.com/questions/141611) that is very much related. I started answering it, building up on the work you did here, but have not so far managed to perform all the computations. Perhaps you will see a simpler way! – amoeba Mar 13 '15 at 21:52
  • 2
    @amoeba, the recent activity brought my attention here again as well, and as much as I appreciate that you accepted my answer, this one is much so much fuller. I wouldn't mind at all if you changed. – ekvall Mar 14 '15 at 20:16
  • 1
    @Student001: this is a fair and generous comment. I switched the accepted answer. I have also found one Q and one A of yours to upvote to make up for it :) – amoeba Mar 14 '15 at 21:50
  • @whuber Could you please tell me what is the distribution of $t$ ? – tam Nov 19 '15 at 17:02
  • 1
    @mat The distribution of $t$ is that of $2U-1$. That makes it a Beta distribution that has been scaled and shifted from the interval $[0,1]$ into the interval $[-1,1]$. – whuber Nov 19 '15 at 22:01
  • @whuber I would really appreciate if you could give me the parameters of this beta distribution. Also, the distribution of $|x \cdot y|^2$ ($x$ and $y$ are complex vectors) follows directly ? if not, I can post this as a separate question. Thank you!! – tam Nov 19 '15 at 22:18
  • @mat The parameters are right here in my answer, so I don't know what else to supply. – whuber Nov 20 '15 at 04:46
  • @whuber you mean $t$ and $u$ have the same distribution? So the scale by $2$ and the shift by $-1$ don't change anything in the distribution ? – tam Nov 20 '15 at 07:39
  • @mat Of course I' don't mean that! I have explicitly stated the relationship between $U$ and $t$. It is such a simple one--it is linear--that writing a formula for the distribution of one of these variables in terms of the other--using a pdf, the cdf, the mgf, the cf, the cgf, or any other means--is easy and automatic. – whuber Nov 20 '15 at 15:21
  • @whuber Somebody asked me about this issue, I came back to your answer and noticed that the equality sign in your first centered formula should rather be $\propto$ because you are omitting a constant factor with some power of $\pi$, right? By the way, a brief explanation of how this formula arises could be really helpful (the linked frustum article only gives formulas for the 3D case). – amoeba Jul 21 '16 at 14:27
  • @Amoeba The equality is correct, because it expresses an algebraic identity. The proportionality is stated in words and exhibited at the beginning of the next formula right after "$f_D(u)du$". I'll reformat it to make that clearer. I thought I had provided a brief explanation of the formula in the preceding paragraph, so I would be glad to know what might be missing from that explanation. – whuber Jul 21 '16 at 15:21
  • @whuber Thanks for the edit. The equality that was bothering me (I must have been a bit unclear in my previous comment) got now edited out and you are now saying "Whence the probability is proportional to", so I am fine with that. Regarding the missing explanation, I just meant that it might be unclear for many readers why is this "whence" true. I assume you are referring to the formula for the surface area of a $n$-sphere which is proportional to $r^n$, but this is not very explicit. – amoeba Jul 21 '16 at 21:07
  • there seems to be an elegant formula for this quantity for a general gaussian -- http://stats.stackexchange.com/questions/263896/moment-mgf-of-cosine-of-two-random-vectors – Yaroslav Bulatov Mar 02 '17 at 00:51
  • @whuber do you know a similar result for the dot products on the unit d-*ball* of uniform density? I.e. with vector lengths $||v||\sim U[0;1]^{1/d}$? Thank you. – Erich Schubert Feb 06 '20 at 12:04
  • @Erich Such a dot product can be expressed as the product of two independent lengths and an independent cosine of the angle. The lengths have Beta(d,1) distributions, whence the PDF of their product is $t\to -d^2t^{d-1}\log(t)$ and, using that along with the result here, you can obtain an expression for the pdf of the entire dot product. – whuber Feb 06 '20 at 16:00
13

Let's find the distribution and then the variance follows by standard results. Consider the vector product and write it on it's cosine form, i.e. note that we have $$P(x'y\leq t)=P(|x||y|\cos\theta\leq t)=P(\cos\theta\leq t)=\mathbb{E}P(\cos\theta\leq t\mid y),$$ where $\theta$ is the angle between $x$ and $y$. In the last step I have used that for any events $A$ and $B$ $$\mathbb EP(A\mid B):=\mathbb{E}[\mathbb{E}[\chi_A\mid B]]=\mathbb{E}\chi_A=P(A).$$ Now consider the term $P(\cos\theta\leq t\mid y)$ . It is clear that since $x$ is choosen uniformly with respect to the sphere surface, it does not matter what $y$ actually is, only the angle between $x$ and $y$ matters. Thus, the term inside the expectation is actually constant as a function of $y$ and we can w.l.o.g. assume that $y=[1,0,0,\dots ]'.$ Then we get that $$P(x'y\leq t)=P\left( x_1\leq t\right).$$ but since $x_1$ is the first coordinate of a normalized Gaussian vector in $\mathbb{R}^n,$ we have that $x'y$ is Gaussian with variance $1/n$ by invoking the asymptotic result of this paper.

For an explicit result of the variance, use the fact that the dot product is mean zero by independence and, as shown above, distributed like the first coordinate of $x$. By these results, finding $\text{Var}(x'y)$ amounts to finding $\mathbb{E}x_1^2$. Now, note that per construction $x'x=1$ and so we can write $$1=\mathbb{E}x'x=\mathbb{E}\sum_{i=1}^nx_i^2=\sum_{i=1}^n\mathbb{E}x_i^2=n\mathbb{E}x_1^2,$$ where the last equality follows from that the coordinates of $x$ are identically distributed. Putting things together, we have found that $\text{Var}(x'y)=\mathbb{E}x_1^2=1/n$

ekvall
  • 4,361
  • 1
  • 15
  • 37
  • Thank you, but I am confused: what exactly is "the desired result" and how does it follow from the last equation? The final probability distribution should depend on $D$. – amoeba Feb 09 '14 at 15:19
  • Actually how the result follows from your last equation is exactly what is discussed on [math.SE thread](http://math.stackexchange.com/questions/469650/probability-of-the-dot-product-between-gaussian-unit-vectors) that you found. It involves beta distributions etc., and the limiting behaviour is (to me) far from obvious. I guess there should be a simpler direct way to see that $\sigma^2(D) \approx 1/D$. – amoeba Feb 09 '14 at 15:24
  • It does depend on the dimension since $x_1=z_1 |z|^{-1}$, where $z$ is the generated Gaussian vector. I'll update the answer later today or tomorrow. – ekvall Feb 09 '14 at 16:11
  • Wow, great, your last link provides the limit of that expression involving inverse beta functions (which I was afraid to compute) in the third equation on page 1. So to complete the reasoning: if the sphere has radius $\sqrt{D}$, then $x_1$ is (asymptotically) distributed as $\mathcal{N}(0,1)$. Which means that for sphere of unit radius variance is $D$ times smaller, i.e. $1/D$. However, I still have a concern: I checked for $D$ from 1 to 4, and $1/D$ seems to give *exact* variance, even though distributions for D=1 or D=2 are very far from normal. There should be a deeper reason behind that. – amoeba Feb 09 '14 at 16:43
  • 1
    @amoeba Yes, updated with a proof of that. – ekvall Feb 09 '14 at 19:00
2

To answer the first part of your question, denote $Z = \langle X,Y \rangle = \sum X_i Y_i$. Define $$ f_{Z_i}(z_i) = \int_{-\infty}^\infty f_{Z_1,\ldots,Z_D}(z_1,\ldots,z_D) \: d z_i $$ The product of the $i^{th}$ elements of $X$ and $Y$ denoted here as $Z_i$ will be distributed according to the joint distribution of $X_i$ and $Y_i$. $$ f_{Z_i}(z_i) = \int_{-\infty}^\infty f_{X_i,Y_i}(x,\frac{z_i}{x})\frac{1}{|x|}dx $$ then since $Z = \sum Z_i$, $$ f_Z(z) = \int_{-\infty}^\infty \ldots \int_{-\infty}^\infty f_{Z_1,\ldots,Z_D} (z_1,\ldots,z_d) \: \delta(z - \sum z_i)\: dz_1\ldots d z_d $$

For the second part, I think that if you want to say anything interesting about the asymptotic behaviour of $\sigma$ you need to at least assume independence of $X$ and $Y$, and then apply a CLT.

For instance if you were willing to assume that the $\{Z_1,\ldots,Z_D\}$ are i.i.d with $\mathbb{E}[Z_i] = \mu$ and $\mathbb{V}[Z_i] = \sigma^2$ you could say that $\sigma^2(D) = \frac{\sigma^2}{D}$ and $\lim_{D\to\infty} \sigma^2(D) = 0$.

tom
  • 106
  • 4
  • Thank you, but I am confused about the second part. $X$ and $Y$ are of course supposed to be independent, I will add this to the question. You say that $\sigma^2(D) = \mathrm{Var}(z_i)/D$, and that sounds reasonable, but what is the asymptotic behaviour of $\mathrm{Var}(z_i)$? I think the expression I am searching for should depend only on $D$. By the way in 2D $\mathrm{Var}(z_i)=1/2$ if I am not mistaken, I wonder if this remains true in higher dimensions... – amoeba Feb 09 '14 at 09:07
  • Is it really possible for the $z_i$ to be independent given the requirement that $X$ and $Y$ are of unit length? – ekvall Feb 09 '14 at 09:59
  • @tom: By the way, I *was* mistaken: in 2D $\mathrm{Var}(z_i)$ is 1, it is $\mathrm{Var}(z)$ that is equal 1/2. I have updated my question with some simulation results. Seems like the correct formula is $1/D$. – amoeba Feb 09 '14 at 12:42