How can the square of an asymptotically normal variable also be asympotically normal?

Question

The Delta method states that, given

$$ \sqrt{n} (X_n - \mu) \xrightarrow{d} N(0, 1) $$

then

$$ \sqrt{n} (g(X_n) - g(\mu)) \xrightarrow{d} N(0, g'(\mu)) $$

I'm surprised that this can be true.

As a counter-example, consider a sequence of random variables $\{X_n\}$ in which all elements $X$ are normally distributed $N(\mu, 1/\sqrt{n})$. This implies $\sqrt{n} (X_n - \mu) \xrightarrow{d} N(0, 1)$, as required by the Delta method.

With $g(X) = X^2$, every elements in the sequence $\{g(X_n)\}$ is the squared of a normal, and thus should have a Chi-square distribution. How can the sequence $\{g(X_n)\}$ becomes asymptotically normal as the Delta method claims?

Even though I use the specific example of $g(X) = X^2$ here, my confusion applies to any $g(X)$ such as $1/X, \exp(X)$, etc. How can a sequence of $\{1/X_n\}$ or $\{\exp(X_n)\}$ becomes asymptotically normal?

The lengthy answers below are better, but this very brief comment should still be helpful: Chi-square distributions _are_ asymptotically normal! More precisely, if you take a sequence of Chi-square distributions with growing parameters, appropriately normalized, then it will still converge to a normal distribution. The same is true for lots of other distributions (such as the familiar sums of iid random variables). This is part of the incredible ubiquity of normal distributions. — Greg Martin, Jul 07 '20 at 17:33

doubled · Accepted Answer · 2020-07-07T18:30:04.610

There seems to be some confusion about what the delta methods really says.

This statement is fundamentally about the asymptotic distribution of the function of an asymptotically normal estimator. In your examples, the functions are defined on $X$, which as you note could follow any distribution! The classic Delta method is fundamentally a statement about the asymptotic distribution of functions of an estimator that is asymptotically normal (which in the case of the sample mean is ensured by the CLT for any $X$ that satisfies the assumptions of CLT). So one example could be $f(X_n) = X_n^2 = \bigg(\frac{1}{n}\sum_i X_i\bigg)^2$. The Delta method says that if $X_n$ follows a normal distribution with mean $\theta$, then $f(X_n)$ also follows a normal distribution with mean $f(\theta)$.

To explicitly answer your scenario where $g(X_n) = X_n^2$, the point is that $g(X_n)$ is not chi square. Suppose we draw $X_i$ iid from some distribution, and suppose that $Var(X_i) = 1$. Let's consider the sequence $\{g(X_n)\}_n$, where $g(X_n) = X_n^2 = \bigg(\frac{1}{n}\sum_i X_i\bigg)^2$. By the CLT, we have that $\sqrt{n}(X_n - \mu) \xrightarrow{d} N(0,1)$ (or, in your post, you just automatically get that distribution without needing to appeal to the CLT). But $X_n^2$ is not Chi-square, because $X_n$ is not standard normal. Instead, $\sqrt{n}(X_n - \mu)$ is standard normal (either by assumption of distribution of $X_n$ or by the CLT) and we accordingly have that $$\big(\sqrt{n}(X_n - \mu)\big)^2 \xrightarrow{d} \chi^2$$

But you're not interested in the distribution of whatever that is. You're interested in the distribution of $X_n^2$. For the sake of exploring, we can think about the distribution of $X_n^2$. Well if $Z\sim N(\mu,\sigma^2)$, then $\frac{Z^2}{\sigma^2}$ is a scaled non-central chi square distribution with one degree of freedom and non-central parameter $\lambda = (\frac{\mu}{\sigma})^2$. But in your case (either by your assumption or by CLT), we have that $\sigma^2 = 1/n$, and so $nX_n^2$ follows a non-central chi square distribution with $\lambda = \mu^2n$ and so $\lambda \to \infty$ as $n\to\infty$. I won't go through the proof, but if you check the wiki page I linked on non central chi square distributions, under Related Distributions, you'll note that for $Z$ noncentral chi with $k$ degrees of freedom and non central parameter $\lambda$, as $\lambda \to \infty$ we have that

$$\frac{Z - (k+\lambda)}{\sqrt{2(k+2\lambda)}} \xrightarrow{d} N(0,1) $$

In our case, $Z = nX_n^2,\lambda = \mu^2n,k = 1$, and so we have that as $n$ goes to infinity, we have that $$\frac{nX_n^2 - (1+\mu^2n)}{\sqrt{2(1+2\mu^2n)}} = \frac{n(X_n^2 - \mu^2) - 1}{\sqrt{2+4\mu^2n}} \xrightarrow{d} N(0,1)$$

I won't be formal, but since $n$ is getting arbitrarily large, it's clear that

$$\frac{n(X_n^2 - \mu^2) - 1}{\sqrt{2+4\mu^2n}} \approx \frac{n(X_n^2 - \mu^2)}{2\mu\sqrt{n}} = \frac{1}{2\mu}\sqrt{n}(X_n^2 - \mu^2)\xrightarrow{d} N(0,1) $$

and using normal properties, we thus have that $$\sqrt{n}(X_n^2 - \mu^2)\xrightarrow{d} N(0,4\mu^2) $$

Seems pretty nice! And what does Delta tell us again? Well, by Delta, we should have that for $g(\theta) = \theta^2$, $$\sqrt{n}(X_n^2 - \mu^2)\xrightarrow{d} N(0,\sigma^2 g'(\theta)^2) = N(0,(2\theta)^2) = N(0,4\mu^2)$$

Sweet! But all those steps were kind of a pain to do.. luckily, the univariate proof of the delta method just approximates all this using a first order taylor expansion as in the wiki page for Delta and it's just a few steps after that. From that proof, you can see that all you really need is for the estimator of $\theta$ to be asymptotically normal and that $f'(\theta)$ is well-defined and non-zero. In the case where it is zero, you can try taking further order taylor expansions, so you may still be able to recover an asymptotic distribution.

Thanks! I edited out the wrong statement of Delta method in my original post. I've also taken the liberty to edit your answer so that it no longer discusses the original, wrong statement. To your main answer, I still don't understand how each individual $g(X_n)$ is chi-square distributed, yet the sequence $\{g(X_n)\}$ can be asymptotically normal? — Heisenberg, Jul 07 '20 at 00:16
@Heisenberg $\{g(X_n)\}$ is not chi square distributed. I updated my post. — doubled, Jul 07 '20 at 01:09
Thanks for the clarification, but in my counterexample, my setup is that every $X_n$ IS normal, which implies that $\{X_n\}$ is asymptotically normal. (I understand that the reverse is not true.) — Heisenberg, Jul 07 '20 at 01:21
But in your example, $X_n \sim N(\mu,1/n)$. That does not mean that $X_n^2$ is chi squared. As I point out, it just means that $(\sqrt{n}(X_n - \mu))^2 \sim \chi^2$. — doubled, Jul 07 '20 at 01:28
I understand your explanation now! Thank you for being clear and patient! At the end, you note that this is a "special case" that proves the Delta method is still correct. If g(X) = 1/X, is there a similar argument where we can show that {g(X)} is also asymptotically normal? I'm trying to figure out whether Delta method can break down when given "weird enough" g(X). — Heisenberg, Jul 07 '20 at 01:39
@Heisenberg you can break it with 1. functions for which the derivative does not exist at $g'(\theta)$, or 2. functions where $g'(\theta) = 0$, but that's just cause you get a degenerate distribution... and so you can consider the second taylor expansion (delta method is based on a first degree taylor expansion)! And I am currently editing my post to explain a bit more about the idea (hopefully I figure it out myself and then can write it hehe :)). — doubled, Jul 07 '20 at 01:41
@Heisenberg yes, you'll get the same thing with g(θ)=1/θ as long as θ≠0. I updated my post to give a very thorough derivation of the case with $f(\theta) = \theta^2$, but the classic delta is proven with a couple lines just using a first order taylor exapnsion.. it's a lot simpler that way :-) — doubled, Jul 07 '20 at 02:20
Just a small detail - asymptotic normality of the sample mean is ensured by the CLT only if $X$ has finite, well-defined variance and mean. So it's not quite "regardless of what $X$ is." Again, that's just a minor correction. It doesn't affect the correctness of your answer, I just wouldn't want passers-by to get the wrong impression of CLT. — kdbanman, Jul 07 '20 at 18:27

Thomas Lumley · Answer 2 · 2020-07-07T01:10:07.270

The Delta method says

$$\sqrt{n}(g(X_n)-g(\mu))\stackrel{d}{\to} N(0, g'(\mu)^2)$$

In your $g(x)=X^2$ example, there are two cases.

First, the degenerate case, when $\mu=0$ and thus $g'(\mu)=0$. The Delta method is correct if you interpret $N(0,0)$ as point mass at zero: $$\sqrt{n}(g(X_n)-g(\mu))\stackrel{p}{\to} 0$$

So, while $X_n^2$ is asymptotically $\chi^2_1$, it's $$nX_n^2\stackrel{d}{\to}\chi^2_1$$ and $$\sqrt{n}X_n^2\stackrel{d}{\to} 0.$$

Second, the non-degenerate case really gives a Normal. Suppose you had $X_n\sim N(1,1/\sqrt{n})$, giving $\mu=1$. Write $Z_n=X_n-1$ Then $$X_n^2 = Z_n^2+2Z_n+1.$$ The $2Z_n$ term is Normal, and the $Z_n^2$ term is of order $1/n$, so it disappears when multiplied by $\sqrt{n}$. You have $$\sqrt{n}\left(X_n^2-1\right)= \sqrt{n}\left(Z_n^2+2Z_n\right)=\sqrt{n}Z_n^2+2\sqrt{n}Z_n$$

Now, just as in the first example $\sqrt{n}Z_n^2\stackrel{d}{\to} 0$, and $2\sqrt{n}Z_n\stackrel{d}{\to} N(0,2^2)$. Combining those $$\sqrt{n}\left(g(X_n)-g(\mu)\right)\stackrel{d}{\to}N(0, g'(\mu)^2)$$ as required

That's basically what happens in all the non-degenerate cases: the term of highest order is Normal, and the non-Normal terms are asymptotically negligible.

Third, trying to do this with $1/X_n$ for $X_n\sim N(0,1/\sqrt{n})$ fails because $g(x)=1/x$ does not have a continuous derivative at $\mu=0$ (which is the other key assumption of the delta method).

For $X_n\sim N(\mu,1/\sqrt{n})$ with $\mu\neq 0$ you end up with the same sort of argument as my one for $g(x)=x^2$. By Taylor's theorem $$1/X_n=1/\mu - \frac{1}{\mu^2}(X_n-\mu) + r_n$$ so $$\sqrt{n}(1/X_n -1/\mu)=-\sqrt{n}\frac{1}{\mu^2}(X_n-\mu)+\sqrt{n}r_n$$ Now $r_n$ involves $(X_n-\mu_n)^2$, so $\sqrt{n}r_n\stackrel{d}{\to} 0$, in the same way as the first example, and $$-\sqrt{n}\frac{1}{\mu^2}(X_n-\mu)\sim N(0, 1/\mu^4)$$ So, $$\sqrt{n}(1/X_n -1/\mu)\stackrel{d}{\to}N(0, g'(\mu)^2)$$ as required.

Could you elaborate a bit on what it means that the "$Z_n^2$ term is of order $1/n$"? What does order mean? Also, could you point out at which point did my counter example go wrong? I don't understand how individual $X_n^2$ is Chi-square distributed, yet the sequence $\{X_n^2\}$ can be asymptotically normal? — Heisenberg, Jul 07 '20 at 00:24
I'm also trying to apply your proof technique to show that $g(X) = 1/X$ is asymptotically normal, but couldn't do so successfully. — Heisenberg, Jul 07 '20 at 00:41

Sextus Empiricus · Answer 3 · 2020-07-07T20:03:28.123

A similar issue occured in this question Implicit hypothesis testing: mean greater than variance and Delta Method

The idea about the delta method is that it is a linear approximation which becomes more and more accurate as the sample increases. But this is only true when you are actually on a slope of the function $g(X)$. In your counter example $g(X)=X^2$, if the slope is zero around the mean for $\mu_X=0$, then this is indeed not the case.

The following images illustrate this (note that the distribution of the sample mean $X_n$ becomes more narrow as $n$ increases and the function $g(X)$ is effectively more linear or 'flat', a bit in the same way as the earth seems flat when you get closer to the surface and look at a smaller scale)

See more about those images in the answer to the before mentioned question

https://stats.stackexchange.com/a/441688

score 2 · Answer 4 · answered Jul 07 '20 at 17:35

Your $X_n^2$ does not have a chi-squared distribution because $X_n$ does not have a mean of $0$.

$X_n^2$ instead has a scaled noncentral chi-squared distribution with mean $1+\frac1n$ and variance $\frac4n +\frac2{n^2}$

and so $Z_n =\sqrt{n}(X_n^2-1)$ has a relocated and scaled noncentral chi-squared distribution with mean $\frac1{\sqrt{n}}$ and variance $4 +\frac2{n}$ and standard deviation $\sqrt{4+\frac2n}$. As $n$ grows, these clearly converge on $0$ and $4$ and $2$, as predicted by the Delta method: if $g(x)=x^2$ then $g'(1)=2$.

$Z_n$ does converge in distribution to the relevant normal distribution and you can prove this using characteristic functions.

It may be more convincing to show the densities for $Z_n$ as $n$ increases, illustrated here when $n$ is $1$ (red), $5$ (blue), $25$ (green) and $125$ (pink), and compare it with the predicted limiting normal distribution in black. For small $n$ the approximation is poor, especially since $Z_n \ge -\sqrt{n}$ with probability $1$, but for large $n$ you can see the convergence in distribution.

How can the square of an asymptotically normal variable also be asympotically normal?

4 Answers4