(Originally posted on MSE.)
I have seen many heuristic discussions of the classical central limit theorem speak of the normal distribution (or any of the stable distributions) as an "attractor" in the space of probability densities. For example, consider these sentences at the top of Wikipedia's treatment:
In more general usage, a central limit theorem is any of a set of weak-convergence theorems in probability theory. They all express the fact that a sum of many independent and identically distributed (i.i.d.) random variables, or alternatively, random variables with specific types of dependence, will tend to be distributed according to one of a small set of attractor distributions. When the variance of the i.i.d. variables is finite, the attractor distribution is the normal distribution.
This dynamical systems language is very suggestive. Feller also speaks of "attraction" in his treatment of the CLT in his second volume (I wonder if that is the source of the language), and Yuval Flimus in this note even speaks of the "basin of attraction." (I don't think he really means "the exact form of the basin of attraction is deducible beforehand" but rather "the exact form of the attractor is deducible beforehand"; still, the language is there.) My question is: can these dynamical analogies be made precise? I don't know of a book in which they are -- though many books do make a point of emphasizing that the normal distribution is special for its stability under convolution (as well as its stability under the Fourier transform). This is basically telling us that the normal is important because it is a fixed point. The CLT goes further, telling us that it is not just a fixed point but an attractor.
To make this geometric picture precise, I imagine taking the phase space to be a suitable infinite-dimensional function space (the space of probability densities) and the evolution operator to be repeated convolution with an initial condition. But I have no sense of the technicalities involved in making this picture work or whether it is worth pursuing.
I would guess that since I can't find a treatment that does pursue this approach explicitly, there must be something wrong with my sense that it can be done or that it would be interesting. If that is the case, I would like to hear why.
EDIT: There are three similar questions throughout Math Stack Exchange and MathOverflow that readers may be interested in: