Is applying the CLT to the sum of random variables a good approximation?

Question

I use $(\mu, \sigma^2)$ to mean a distribution with mean $\mu$ and variance $\sigma^2$, $\mathcal{N}$ added to mean the normal distribution.

Let's suppose $X_1, \dots, X_n\overset{\text{iid}}{\sim}(\mu, \sigma^2)$ with $\sigma^2 < \infty$. The formal statement of the central limit theorem (CLT) says that $$\dfrac{\bar{X}_n - \mu}{\sigma/\sqrt{n}}\overset{d}{\to}\mathcal{N}(0, 1)\text{.}$$ It's discussed here that the statement $$\bar{X}_n \sim \mathcal{N}(\mu, \sigma^2/n)$$ is not a statement about convergence in distribution, but rather, an approximation. This approximation is frequently cited as being a pretty decent approximation when $n \geq 30$.

Now, theoretically, we could go one step further and say that $$\sum_{i=1}^{n}X_i\sim\mathcal{N}(n\mu, n\sigma^2)\tag{1}$$ is an approximate statement from the CLT.

Given that $(1)$ isn't the actual CLT, I wonder how well this approximation performs. Does it perform well in general? Honestly, I'd be concerned about this in the case of a particularly skewed distribution.

If this is too broad, I can close this.

This is a very well crafted question. You might find the answers have already appeared, though. Try this search: https://stats.stackexchange.com/search?q=esseen. — whuber, Dec 08 '17 at 20:17
You might want to look at the Berry-Esseen theorem for information about the rate of convergence. The term good approximation is subjective. To be precise define what the maximum distance between the approximating distribution and the standard normal needs to be for the approximation to be "good". — Michael R. Chernick, Dec 08 '17 at 20:36
I don't think $\bar{X}_n \sim N(\mu, \sigma^2/n)$ is a notation accurate enough (even with the additional text explaining it). A better notation would be "$\bar{X}_n$ is $AN(\mu, \sigma^2/n)$", with the understanding that it means $\sqrt{n}(\bar{X}_n - \mu)/\sigma$ converges to $N(0, 1)$ in distribution. — Zhanxiong, Dec 08 '17 at 21:46
@Zhanxiong: the notation I'm more familiar with is $\bar X_n \dot \sim N(\mu, \sigma^2/n)$, with $\dot \sim$ meaning "approximately distributed". — Cliff AB, Dec 08 '17 at 21:49
Empirically, the approximation quality depends on the underlying distribution of $X_i$. Intuitively, the approximation works better for symmetric and continuous r.v.s. For example, you may need much smaller $n$ to get a decent normal approximation for $X \sim \text{Bin}(1, 0.5)$ than that for $X \sim \text{Bin}(1, 0.01)$. — Zhanxiong, Dec 08 '17 at 21:49
@Cliff AB Personally, I am not a big fan of the "$\overset{\cdot}{\sim}$" notation either, as it is somewhat confusing, which makes people think $\bar{X}_n$ "has" the normal distribution, rather than converges to normal distribution. Of course, the difference may not be substantial for applications. — Zhanxiong, Dec 08 '17 at 21:53
For an approximation which is often better, look into the saddlepoint approximation, see https://stats.stackexchange.com/questions/191492/how-does-saddlepoint-approximation-work — kjetil b halvorsen, Dec 10 '17 at 18:56
I would suggest simulation to check how good is the approximation in your case. Actual constants in the Berry-Esseen inequality are far less than theoretical $~0.5$. E.g., see https://stats.stackexchange.com/questions/30468/error-in-normal-approximation-to-a-uniform-sum-distribution — Viktor, Dec 19 '17 at 14:25

score 5 · Accepted Answer · 2017-12-19T16:33:38.973

Going the other way around, if the Z-score were truly a standard normal distribution, then your subsequent approximations would be exact. The degree of error should roughly scale with some measure of distance between the Z-score distribution and the standard Gaussian.

We can use K-S distance as our metric in the space of CDFs. Let's say that we will collect $N$ samples and our (unknown) true sample CDF of the Z-score of these $N$ samples will have a K-S distance of $\epsilon_N$: $\max_z |F_{Z_n}(z) - F_{\Phi}(z)| = \epsilon_N$.

Now, going from the $F_{Z_n}(z)$ to $F_{S_n}(s)$ where $S_n = \sum_1^N X_i$ involves only a shift of scale and location (i.e., a linear transformation $Lz$ of the argument of $F_{Z_n}(z)$). The same applies to get $F_{\Phi}(z)$ to a sum of normal random variables with the same mean and variance as your actual population. In fact, you will be making the exact same transformation to both variables, so we will simply be mapping $F_{Z_n}(z) \mapsto F_{Z_n}(L^{-1}z)$ and similarly for $F_{\Phi}$ -- because we are subjecting each distribution's argument to the same transformation, we will preserve vertical distances.

So, the KS distance for the $F_{S_n}$ will converge to zero at the same rate as for $F_{Z_n}$. However, $F_{S_n}$ doesn't have a limiting distribution (it's basically $F(x)=0.5$, which is not a distribution) whereas $F_{Z_n}$ converges to an actual distribution function.

Is applying the CLT to the sum of random variables a good approximation?

1 Answers1