Derivation of normalizing transform for GLMs

Question

$\newcommand{\E}{\mathbb{E}}$How is the $A(\cdot) = \displaystyle\int\frac{du}{V^{1/3}(\mu)}$ normalizing transform for the exponential family derived?

More specifically: I tried to follow the Taylor expansion sketch on page 3, slide 1 here but have several questions. With $X$ from an exponential family, transformation $h(X)$, and $\kappa _i$ denoting the $i^{th}$ cumulant, the slides argue that: $$ \kappa _3(h(\bar{X})) \approx h'(\mu)^3\frac{\kappa _3(\bar{X})}{N^2} + 3h'(\mu)^2h''(\mu)\frac{\sigma^4}{N} + O(N^{-3}), $$ and it remains to simply find $h(X)$ such that the above evaluates to 0.

My first question is about arithmetic: my Taylor expansion has different coefficients, and I can't justify their having dropped many of the terms.

\begin{align} \text{Since }h(x) &\approx h(\mu) + h'(\mu)(x - \mu) + \frac{h''(x)}{2}(x - \mu)^2\text{, we have:} \\ h(\bar{X}) - h(u) &\approx h'(u))(\bar{X} - \mu) + \frac{h''(x)}{2}(\bar{X} - \mu)^2 \\ \E\left(h(\bar{X}) - h(u)\right)^3 &\approx h'(\mu)^3 \E(\bar{X}-\mu)^3 + \frac{3}{2}h'(\mu)^2h''(\mu) \E(\bar{X} - \mu)^4 + \\ &\quad \frac{3}{4}h'(\mu)h''(\mu)^2 \E(\bar{X}-\mu)^5 + \frac{1}{8}h''(\mu)^3 \E(\bar{X} - \mu)^6. \end{align}

I can get to something similar by replacing the central moments by their cumulant equivalents, but it still doesn't add up.
The second question: why does the analysis start with $\bar{X}$ instead of $X$, the quantity we actually care about?

you seem to have $u$ several times where you mean $\mu$ – Glen_b Jun 19 '16 at 17:36 — Glen_b, Jun 19 '16 at 17:36

score 2 · Accepted Answer · answered Jan 27 '17 at 15:47

The slides you link to are somewhat confusing, leaving out steps and making a few typos, but they are ultimately correct. It will help to answer question 2 first, then 1, and then finally derive the symmetrizing transformation $A(u) = \int_{-\infty}^u \frac{1}{[V(\theta)]^{1/3}} d\theta$.

Question 2. We are analyzing $\bar{X}$ as it the mean of a sample of size $N$ of i.i.d. random variables $X_1, ..., X_N$. This is an important quantity because sampling the same distribution and taking the mean happens all the time in science. We want to know how close $\bar{X}$ is to the true mean $\mu$. The Central Limit Theorem says it will converge to $\mu$ as $N \to \infty$ but we would like to know the variance and skewness of $\bar{X}$.

Question 1. Your Taylor series approximation is not incorrect, but we need to be careful about keeping track of $\bar{X}$ vs. $X_i$ and powers of $N$ to get to the same conclusion as the slides. We'll start with the definitions of $\bar{X}$ and central moments of $X_i$ and derive the formula for $\kappa_3(h(\bar{X}))$:

$\bar{X} = \frac{1}{N}\sum_{i=1}^N X_i$

$\mathbb{E}[X_i] = \mu$

$V(X_i) = \mathbb{E}[(X_i - \mu)^2] = \sigma^2$

$\kappa_3(X_i) = \mathbb{E}[(X_i - \mu)^3]$

Now, the central moments of $\bar{X}$:

$\mathbb{E}[\bar{X}] = \frac{1}{N}\sum_{i=1}^N \mathbb{E}[X_i] = \frac{1}{N}(N\mu) = \mu$

$\begin{align} V(\bar{X}) &=\mathbb{E}[(\bar{X} - \mu)^2]\\ &=\mathbb{E}[\Big((\frac{1}{N}\sum_{i=1}^N X_i) - \mu\Big)^2]\\ &=\mathbb{E}[\Big(\frac{1}{N}\sum_{i=1}^N (X_i - \mu)\Big)^2]\\ &=\frac{1}{N^2}\Big(N\mathbb{E}[(X_i - \mu)^2] + N(N-1)\mathbb{E}[X_i - \mu]\mathbb{E}[X_j - \mu]\Big)\\ &= \frac{1}{N}\sigma^2 \end{align}$

The last step follows since $\mathbb{E}[X_i - \mu] = 0$, and $\mathbb{E}[(X_i - \mu)^2] = \sigma^2$. This might not have been the easiest derivation of $V(\bar{X})$, but it is the same process we need to do to find $\kappa_3(\bar{X})$ and $\kappa_3(h(\bar{X}))$, where we break up a product of a summation and count the number of terms with powers of different variables. In the above case, there were $N$ terms that were of the form $(X_i - \mu)^2$ and $N(N-1)$ terms of the form $(X_i - \mu)(X_j - \mu)$.

$\begin{align} \kappa_3(\bar{X}) &= \mathbb{E}[(\bar{X}-\mu)^3)]\\ &= \mathbb{E}[\Big((\frac{1}{N}\sum_{i=1}^N X_i) - \mu\Big)^3]\\ &= \mathbb{E}[\Big(\frac{1}{N}\sum_{i=1}^N (X_i - \mu)\Big)^3]\\ &= \frac{1}{N^3}\Big(N\mathbb{E}[(X_i - \mu)^3] + 3N(N-1)\mathbb{E}[(X_i - \mu)\mathbb{E}[(X_j - \mu)^2]+N(N-1)(N-2)\mathbb{E}[(X_i - \mu)]\mathbb{E}[(X_j - \mu)]\mathbb{E}[(X_k - \mu)]\\ &= \frac{1}{N^2}\mathbb{E}[(X_i - \mu)^3]\\ &= \frac{\kappa_3(X_i)}{N^2} \end{align}$

Next, we will expand $h(\bar{X})$ in a Taylor series as you have:

$h(\bar{X}) = h(\mu) + h'(\mu)(\bar{X} - \mu) + \frac{1}{2}h''(\mu)(\bar{X}-\mu)^2 + \frac{1}{3}h'''(\mu)(\bar{X}-\mu)^3 + ...$

$\begin{align} \mathbb{E}[h(\bar{X})] &= h(\mu) + h'(\mu)\mathbb{E}[\bar{X} - \mu] + \frac{1}{2}h''(\mu)\mathbb{E}[(\bar{X}-\mu)^2] + \frac{1}{3}h'''(\mu)\mathbb{E}[(\bar{X}-\mu)^3] + ...\\ &= h(\mu) + \frac{1}{2}h''(\mu)\frac{\sigma^2}{N} + \frac{1}{3}h'''(\mu)\frac{\kappa_3(X_i)}{N^2} + ...\\ \end{align}$

With some more effort you could prove the rest of the terms are $O(N^{-3})$. Finally, since $\kappa_3(h(\bar{X})) = \mathbb{E}[(h(\bar{X})-\mathbb{E}[h(\bar{X})])^3]$, (which is not the same as $\mathbb{E}[(h(\bar{X})-h(\mu))^3]$), we again make a similar computation:

$\begin{align} \kappa_3(h(\bar{X})) &= \mathbb{E}[(h(\bar{X})-\mathbb{E}[h(\bar{X})])^3]\\ &=\mathbb{E}\Big[\Big(h(\mu) + h'(\mu)(\bar{X} - \mu) + \frac{1}{2}h''(\mu)(\bar{X}-\mu)^2 + O((\bar{X}-\mu)^3) - h(\mu) - \frac{1}{2}h''(\mu)\frac{\sigma^2}{N} - O(N^{-2})\Big)^3\Big] \end{align}$

We are only interested in the terms resulting in order $O(N^{-2})$, and with extra work you could show that you do not need the terms "$O((\bar{X}-\mu)^3)$" or "$- O(N^{-2})$" before taking the third power, as they will only result in terms of order $O(N^{-3})$. So, simplifying, we get

$\begin{align} \kappa_3(h(\bar{X})) &= \mathbb{E}\Big[\Big(h'(\mu)(\bar{X} - \mu) + \frac{1}{2}h''(\mu)(\bar{X}-\mu)^2 - \frac{1}{2}h''(\mu)\frac{\sigma^2}{N})\Big)^3\Big]\\ &=\mathbb{E}\Big[h'(\mu)^3(\bar{X} - \mu)^3 + \frac{1}{8}h''(\mu)^3(\bar{X}-\mu)^6 - \frac{1}{8}h''(\mu)^3\frac{\sigma^6}{N^3} + \frac{3}{2}h'(\mu)^2h''(\mu)(\bar{X}-\mu)^4 + \frac{3}{4}h'(\mu)h''(\mu)(\bar{X}-\mu)^5 - \frac{3}{2}h'(\mu)^2h''(\mu)(\bar{X} - \mu)^2\frac{\sigma^2}{N} + O(N^{-3})\Big] \end{align}$

I left off some terms that were obviously $O(N^{-3})$ in this product. You'll have to convince yourself that the terms $\mathbb{E}[(\bar{X}-\mu)^5]$ and $\mathbb{E}[(\bar{X}-\mu)^6]$ are $O(N^{-3})$ as well. However,

$\begin{align} \mathbb{E}[(\bar{X}-\mu)^4] &= \mathbb{E}[\frac{1}{N^4}\Big(\sum_{i=1}^N(\bar{X}-\mu)\Big)^4]\\ &=\frac{1}{N^4}\Big(N\mathbb{E}[(X_i-\mu)^4] + 3N(N-1)\mathbb{E}[(X_i-\mu)^2]\mathbb{E}[(X_j-\mu)^2] + 0\Big)\\ &=\frac{3}{N^2}\sigma^4 + O(N^{-3}) \end{align}$

Then distributing the expectation on our equation for $\kappa_3(h(\bar{X}))$, we have

$\begin{align}\kappa_3(h(\bar{X})) &= h'(\mu)^3\mathbb{E}[(\bar{X} - \mu)^3] + \frac{3}{2}h'(\mu)^2h''(\mu)\mathbb{E}[(\bar{X}-\mu)^4] - \frac{3}{2}h'(\mu)^2h''(\mu)\mathbb{E}[(\bar{X} - \mu)^2]\frac{\sigma^2}{N} + O(N^{-3})\\ &= h'(\mu)^3\frac{\kappa_3(X_i)}{N^2} + \frac{9}{2}h'(\mu)^2h''(\mu)\frac{\sigma^4}{N^2} - \frac{3}{2}h'(\mu)^2h''(\mu)\frac{\sigma^4}{N^2} + O(N^{-3})\\ &=h'(\mu)^3\frac{\kappa_3(X_i)}{N^2} + 3h'(\mu)^2h''(\mu)\frac{\sigma^4}{N^2} + O(N^{-3}) \end{align}$

This concludes the derivation of $\kappa_3(h(\bar{X}))$. Now, at last, we will derive the symmetrizing transform $A(u) = \int_{-\infty}^u \frac{1}{[V(\theta)]^{1/3}} d\theta$.

For this transformation, it is important that $X_i$ is from an exponential family distribution, and in particular a natural exponential family (or it has been transformed into this distribution), of the form $f_{X_i}(x;\theta) = h(x)\exp(\theta x - b(\theta))$

In this case, the cumulants of the distribution are given by $\kappa_k = b^{(k)}(\theta)$. So $\mu = b'(\theta)$, $\sigma^2 = V(\theta) = b''(\theta)$, and $\kappa_3 = b'''(\theta)$. We can write the parameter $\theta$ as a function of $\mu$ just taking the inverse of $b'$, writing $\theta(\mu) = (b')^{-1}(\mu)$. Then

$\theta'(\mu) = \frac{1}{b''((b')^{-1}(\mu))} = \frac{1}{b''(\theta))} = \frac{1}{\sigma^2}$

Next we can write the variance as a function of $\mu$, and call this function $\bar{V}$:

$\bar{V}(\mu) = V(\theta(\mu)) = b''(\theta(\mu))$

Then

$\frac{d}{d\mu}\bar{V}(\mu) = V'(\theta(\mu))\theta'(\mu) = b'''(\theta)\frac{1}{\sigma^2} = \frac{\kappa_3}{\sigma^2}$

So as a function of $\mu$, $\kappa_3(\mu) = \bar{V}'(\mu)\bar{V}(\mu)$.

Now, for the symmetrizing transformation, we want to reduce the skewness of $h(\bar{X})$ by making $h'(\mu)^3\frac{\kappa_3(X_i)}{N^2} + 3h'(\mu)^2h''(\mu)\frac{\sigma^4}{N^2} = 0$ so that $h(\bar{X})$ is $O(N^{-3})$. Thus, we want

$h'(\mu)^3\kappa_3(X_i) + 3h'(\mu)^2h''(\mu)\sigma^4 = 0$

Substituting our expressions for $\sigma^2$ and $\kappa_3$ as functions of $\mu$, we have:

$h'(\mu)^3\bar{V}'(\mu)\bar{V}(\mu) + 3h'(\mu)^2h''(\mu)\bar{V}(\mu)^2 = 0$

So $h'(\mu)^3\bar{V}'(\mu) + 3h'(\mu)^2h''(\mu)\bar{V}(\mu) = 0$, leading to $\frac{d}{d\mu}(h'(\mu)^3\bar{V}(\mu)) = 0$.

One solution to this differential equation is:

$h'(\mu)^3\bar{V}(\mu) = 1$,

$h'(\mu) = \frac{1}{[\bar{V}(\mu)]^{1/3}}$

So, $h(\mu) = \int_c^\mu \frac{1}{[\bar{V}(\theta)]^{1/3}} d\theta$, for any constant, $c$. This gives us the symmetrizing transformation $A(u) = \int_{-\infty}^u \frac{1}{[V(\theta)]^{1/3}} d\theta$, where $V$ is the variance as a function of the mean in a natural exponential family.

score 1 · Answer 2 · edited Dec 24 '16 at 17:56

$\blacksquare$ 1.Why can't I get the same result by approximating in terms of noncentral moments $\mathbb{E}\bar{X}^k$ and then calculate the central moments $\mathbb{E}(\bar{X}-\mathbb{E}\bar{X})^k$using the approximating noncentral moments?

Because you change the derivation arbitrarily and drop the residue term which is important. If you are not familiar with the big O notation and relevant results, a good reference is [Casella&Lehmann].

$$h(\bar{X}) - h(u) \approx h'(u)(\bar{X} - \mu) + \frac{h''(x)}{2}(\bar{X} - \mu)^2 +O[(\bar{X} - \mu)^3]$$

$$\mathbb{E}[h(\bar{X}) - h(u)] \approx h'(u)\mathbb{E}(\bar{X} - \mu) + \frac{h''(x)}{2}\mathbb{E}(\bar{X} - \mu)^2+(?) $$

But even if you do not drop the residue by arguing that you are always doing $N\rightarrow \infty$(which is not legal...), the following step: $$ \E\left(h(\bar{X}) - h(u)\right)^3 \approx h'(\mu)^3 \E(\bar{X}-\mu)^3 + \frac{3}{2}h'(\mu)^2h''(\mu) \E(\bar{X} - \mu)^4 + \frac{3}{4}h'(\mu)h''(\mu)^2 \E(\bar{X}-\mu)^5 + \frac{1}{8}h''(\mu)^3 \E(\bar{X} - \mu)^6. (1)$$ is saying that $$\int [h(x)-h(x_0)]^3dx=\int [h'(x_0)(x-x_0)+\frac{1}{2}h''(x_0)(x-x_0)^2+O((x-x_0)^3)]^3dx=(1)$$

if this is still not clear, we can see the algebra of expanding the integrand goes as

$[h'(x_0)(x-x_0)+\frac{1}{2}h''(x_0)(x-x_0)^2+O((x-x_0)^3)]^3(2)$

Letting $A=h'(x_0)(x-x_0)$,$B=\frac{1}{2}h''(x_0)(x-x_0)^2$,$C=O((x-x_0)^3)$ $(2)=[A+B+C]^3$ $\color{red}{\neq}[A^3+3A^2 B+3A B^2+B^3]=[A+B]^3=(1)$

Your mistake is to omit the residue before expansion, which is a "classical" mistake in big O notation and later became a criticism of the usage of big O notation.

$\blacksquare$ 2.Why does the analysis start with $\bar{X}$ instead of $X$, the quantity we actually care about?

Because we want to base our analysis on the sufficient statistics of the exponential model we are introducing. If you have a sample of size 1 then there is no difference whether you analyze with $\bar{X}=\frac{1}{n}\sum_{i=1}^{n}X_i$ OR $X_1$.

This is a good lesson in big O notation though it is not relevant to GLM...

Reference [Casella&Lehmann]Lehmann, Erich Leo, and George Casella. Theory of point estimation. Springer Science & Business Media, 2006.

Derivation of normalizing transform for GLMs

2 Answers2