A dynamical systems view of the Central Limit Theorem?

Question

(Originally posted on MSE.)

I have seen many heuristic discussions of the classical central limit theorem speak of the normal distribution (or any of the stable distributions) as an "attractor" in the space of probability densities. For example, consider these sentences at the top of Wikipedia's treatment:

In more general usage, a central limit theorem is any of a set of weak-convergence theorems in probability theory. They all express the fact that a sum of many independent and identically distributed (i.i.d.) random variables, or alternatively, random variables with specific types of dependence, will tend to be distributed according to one of a small set of attractor distributions. When the variance of the i.i.d. variables is finite, the attractor distribution is the normal distribution.

This dynamical systems language is very suggestive. Feller also speaks of "attraction" in his treatment of the CLT in his second volume (I wonder if that is the source of the language), and Yuval Flimus in this note even speaks of the "basin of attraction." (I don't think he really means "the exact form of the basin of attraction is deducible beforehand" but rather "the exact form of the attractor is deducible beforehand"; still, the language is there.) My question is: can these dynamical analogies be made precise? I don't know of a book in which they are -- though many books do make a point of emphasizing that the normal distribution is special for its stability under convolution (as well as its stability under the Fourier transform). This is basically telling us that the normal is important because it is a fixed point. The CLT goes further, telling us that it is not just a fixed point but an attractor.

To make this geometric picture precise, I imagine taking the phase space to be a suitable infinite-dimensional function space (the space of probability densities) and the evolution operator to be repeated convolution with an initial condition. But I have no sense of the technicalities involved in making this picture work or whether it is worth pursuing.

I would guess that since I can't find a treatment that does pursue this approach explicitly, there must be something wrong with my sense that it can be done or that it would be interesting. If that is the case, I would like to hear why.

EDIT: There are three similar questions throughout Math Stack Exchange and MathOverflow that readers may be interested in:

Welcome to Cross Validated and thanks for the interesting question (and answer)! — Matt Krause, May 04 '16 at 05:22
Thanks very much for this interesting question. Perhaps the following reference is also relevant. I didn't see it among the others here: On The Central Limit Theorem For Dynamical Systems (Burton & Denker, 1987) https://www.ams.org/journals/tran/1987-302-02/S0002-9947-1987-0891642-6/ — wil3, Aug 17 '21 at 03:33

symplectomorphic · Answer 1 · 2018-10-19T19:05:10.983

After doing some digging in the literature, encouraged by Kjetil's answer, I've found a few references that do take the geometric/dynamical systems approach to the CLT seriously, besides the book by Y. Sinai. I'm posting what I've found for others who may be interested, but I hope still to hear from an expert about the value of this point of view.

The most significant influence seems to have come from the work of Charles Stein. But the most direct answer to my question seems to be from Hamedani and Walter, who put a metric on the space of distribution functions and show that convolution generates a contraction, which yields the normal distribution as the unique fixed point.

M. Anshelevich, The linearization of the central limit operator in free probability theory, arXiv:math/9810047v2.
L.H.Y. Chen, L. Goldstein, and Q. Shao, Normal Approximation by Stein's Method, Springer, 2011.
J.A. Goldstein, Semigroup-theoretic proofs of the central limit theorem and other theorems of analysis, Semigroup Forum 12 (1976), no. 3, 189–206.
G.G. Hamedani and G.G. Walter, A fixed point theorem and its application to the central limit theorem, Arch. Math. (Basel) 43 (1984), no. 3, 258–264.
S. Swaminathan, Fixed-point-theoretic proofs of the central limit theorem, in Fixed Point Theory and Applications (Marseille, 1989), Pitman Res. Notes Math. Ser., vol. 252, Longman Sci. Tech., Harlow, 1991, pp. 391–396. Cited in Karl Stromberg, Probability for Analysts, page 114.

ADDED October 19, 2018.

Another source for this point of view is Oliver Knill's Probability and Stochastic Processes with Applications, p. 11 (emphasis added):

Markov processes often are attracted by fixed points of the Markov operator. Such fixed points are called stationary states. They describe equilibria and often they are measures with maximal entropy. An example is the Markov operator $P$, which assigns to a probability density $f_y$ the probability density of $f_{\overline{Y+X}}$ where $\overline{Y+X}$ is the random variable $Y + X$ normalized so that it has mean $0$ and variance $1$. For the initial function $f= 1$, the function $P^n(f_X)$ is the distribution of $S^{*}_n$ the normalized sum of $n$ IID random variables $X_i$. This Markov operator has a unique equilibrium point, the standard normal distribution. It has maximal entropy among all distributions on the real line with variance $1$ and mean $0$. The central limit theorem tells that the Markov operator $P$ has the normal distribution as a unique attracting fixed point if one takes the weaker topology of convergence in distribution on $\mathcal{L}^1$. This works in other situations too. For circle-valued random variables for example, the uniform distribution maximizes entropy. It is not surprising therefore, that there is a central limit theorem for circle-valued random variables with the uniform distribution as the limiting distribution.

kjetil b halvorsen · Answer 2 · 2016-05-04T07:39:03.777

7

The text "Probability Theory An Introductory Course" by Y Sinai (Springer) discusses the CLT in this way.

http://www.springer.com/us/book/9783662028452

The idea is (from memory ...) that

1) The normal distribution maximizes entropy (among distributions with fixed variance) 2) The averaging operator $A(x_1,x_2) = \frac{x_1+x_2}{\sqrt{2}}$ maintains variance and increases entropy ... and the rest is technique. So, then you get the dynamical systems setting of iteration of an operator.

edited May 04 '16 at 07:39

answered May 03 '16 at 15:38

kjetil b halvorsen

63,378
26
142
467

1

Thanks for the reference. A quick glance suggests there is a unique treatment there. Also, a little Googling (of CLT + "fixed point") has pointed me to Stein's method, which seems to be one way of making all this precise (and generalizing it far beyond the stringent hypotheses of the classical CLT). – symplectomorphic May 04 '16 at 02:54

Albert · Answer 3 · 2020-04-04T21:58:24.577

Great question; I've often wondered about that. A somewhat related idea is explained in our paper Dynamical attraction to stable processes, Ann. Inst. H. Poincaré Probab. Statist. Volume 48, Number 2, 2012, pp 551-578 (Albert Fisher and Marina Talet) see https://www.ime.usp.br/~afisher/ The idea is to turn Levy's probability notion of "domain of attraction" for stable processes (including Gaussian) into actual dynamics. We do this for the full stable process, rather than just for the stable distribution, because there the scaling property of these self-similar processes has a dynamical interpretation: it's a Bernoulli flow of infinite entropy. Then a random walk with increments in the domain of attraction converges to this in the sense that a walk path is a.s. a generic point for this flow. We don't use a contraction mapping per se but it is an interesting question to see if something like that might be useful. (Our theorem proves an a.s.i.p. in log density; the regularly varying case is especially tricky and there we have to apply an appropriate time change). See also the related papers on the above web page.

The advantage of working with processes is that one has an actual flow. For the Gaussian distribution, this is not only a fixed point for the Fourier Transform, but for the convolution operator (suitably rescaled). The first is intriguing but there's no way that iteration helps, because it's an involution: applying it twice brings you back. The convolution really makes sense as that's just the distribution of a random walk. However, it's a semigroup action which is less sweet from the dynamical point of view than a flow.

A dynamical systems view of the Central Limit Theorem?

3 Answers3

Linked