2

I was under the impression that the reason that the normal distribution occurs naturally could be explained by the central limit theorem (CLT). I recently watched a video that described a derivation of the normal probability density function using the case of darts clustered around the center of a dartboard. As far as I can tell, the example is distinct from the CLT.

The assumptions about the distribution of darts are as follows:

  1. Darts are clustered around a center and are less dense further from the center.
  2. The distribution of darts in the X direction is statistically independent of the distribution of darts in the Y direction.
  3. The distribution of darts is equivalent for any point with a given radius $r = \sqrt{x^2+y^2}$.

If you told me that dart throws are normally distributed, I would be unphased (something something CLT, presumably). But, it is shocking to me that these very natural assumptions for a two-dimensional distribution are met uniquely by the normal distribution. Is it a coincidence that the normal distribution appears in this context when there doesn't appear to be any connection to the central limit theorem? Or is there a deeper connection?

Ryan Volpi
  • 1,638
  • 8
  • 17
  • You say that the conditions are met uniquely by a bivariate Gaussian, but why wouldn’t independent $t$ distributions meet the conditions? That certainly meets the first two conditions ($t$s taper off, even if slower than normal distributions, and they’d be independent of one another) and I think the third. – Dave Jul 05 '20 at 01:25
  • 2
    Independent $t$ don't satisfy the third condition: they have outliers preferentially near the horizontal and vertical. – Thomas Lumley Jul 05 '20 at 01:32
  • @Dave I only say it because the normal pdf can be mathematically derived from these assumptions alone (unless I missed something?) as demonstrated in the linked video. I had the same thought as you about the *t* distribution. – Ryan Volpi Jul 05 '20 at 01:52

1 Answers1

3

It is surprising, and also inconvenient for mathematical statistics, that there aren't more distributions satisfying all these conditions. I'm not sure how deep it is. The bivariate Normal is the maximum entropy distribution, which is why it's the CLT limit, and entropy is rotationally symmetric, so in that sense there's a reason.

Why aren't there others? The issue is that condition 2 says $X$ and $Y$ are independent, and condition 3 imposes strict conditions on the relationship between $X$ and $Y$, and these don't go together.

Suppose $X$ and $Y$ are independent (condition 2). We also want $r$ and $\theta$ in polar coordinates to be independent (condition 3). So the joint density has to be given both by $f(x)f(y)$ and by $g(r)$ (not depending on $\theta$).

Transforming the density $g(r)$ to rectangular coordinates gives $$g(\sqrt{x^2+y^2})(1/r)$$ so we will only satisfy condition 2 if $g(\sqrt{x^2+y^2})/r$ happens to factor into $f(x)f(y)$, which is obviously not going to happen very often, but does happen for the Normal pdf (which has a square to undo the square root, then an exponential to turn addition into multiplication).

There's a full proof as one answer to this question about surprising characterisations of the Normal distribution.

Thomas Lumley
  • 21,784
  • 1
  • 22
  • 73
  • Thank you for the insightful response! I was missing the connection to maximum entropy, so that point was especially helpful. – Ryan Volpi Jul 05 '20 at 16:18