2

I am trying to understand in layman's terms how the anscombe transform converts a poisson distribution into a normal distribution. So, why is a log transform not sufficient in its own to obtain the normal distribution.

I understand that anscombe transform performs a variance stabilization. Is this somewhat similar to applying a z-score transform/standardisation, such that variance tends to 1 or constant? Is it such that log transform on its own is not able to produce a stable enough variance even though the distribution become normal?

@Henry so it is the case that anscombe stabilises the variance, whilst the log transform transforms the standard deviation? "Adjusting for the mean of the square root of the sum (a little less than √nμ) also gives convergence in distribution to a normal distribution" - why does the anscombe transform not take this mean into account to transform distribution towards normal? I understand that "Poisson random variable can take the value 0 with positive probability" is an issue for log normal transform, but why not just do a z score i.e. subtract mean and divide standard deviation? That would also result in a normal distribution. Would not this achieve what log and anscombe transform does in combination?

SysEng
  • 73
  • 9
  • All Poisson distributions have positive chances of being zero. What is the log of zero? – whuber Jul 22 '21 at 19:47
  • @whuber apparently the fashion in cases where $X$ can be zero but not negative is to use $\log(X+1)$ – Henry Jul 22 '21 at 22:48
  • 1
    The Anscombe transform does not convert a Poisson distributed variable into one with a normal distribution. Related / relevant: https://stats.stackexchange.com/questions/46418/why-is-the-square-root-transformation-recommended-for-count-data – Glen_b Jul 23 '21 at 02:13
  • 2
    @Henry That is problematic. I proposed a better procedure at https://stats.stackexchange.com/a/30749/919. – whuber Jul 23 '21 at 14:30

1 Answers1

2

Taking the square root of the sum of $n$ iid non-negative random variables with mean $\mu>0$ and variance $\sigma^2>0$ is variance stabilising in general (see a related result) in that the variance of the square root of the sum heads to about $\frac{\sigma}{4\mu}$ as $n$ increases and that limit does not depend on $n$. You cannot say that for the logarithm of the sum. Adjusting for the mean of the square root of the sum (a little less than $\sqrt{n\mu}$) also gives convergence in distribution to a normal distribution as $n$ increases.

A Poisson random variable $X$ with mean and variance $n$ can be seen as the sum of $n$ iid Poisson random variables with mean and variance $1$, so you can apply the previous result to get the conclusion that $\sqrt X$ has a variance heading towards $\frac14$ as $n$ increases. Multiply this by $2$ to get $2\sqrt X$ has a variance heading towards $1$ as n increases. Make a slight adjustment to $2\sqrt{X+\frac38}$ (the Anscombe transform) and the convergence of the variance to $1$ is faster.

Even then a Poisson random variable will be a discrete random variable, and the same will be true of its transform, so you need care comparing its distribution with a normal approximation, especially if the mean is low.

Another issue is that a Poisson random variable can take the value $0$ with positive probability, and that would not give a finite logarithm, so you would probably want to use something like $\log(X+1)$ instead. For a Poisson random variable with mean $n$, the variance of $\log(X+1)$ seems to be close to $\frac1n$ for large $n$, which is not stable as $n$ increases.

But there are other cases where taking logarithms can be variance stabilising. If you have a family of positive random variables where the standard deviation is proportional to the mean (random variables with gamma distributions of fixed shape are examples - including exponential distributions), then taking logarithms can be variance stabilising even if it does not lead to a normal distribution. Poisson random variables do not fit this condition, since it is their variance not standard deviation which is proportional to the mean.

Henry
  • 30,848
  • 1
  • 63
  • 107
  • Re your remark about "variance of $\log(X+1)$ seems to be close to $1/n$ for large $n$": how do you obtain that? It's not true. – whuber Jul 27 '21 at 22:23
  • @whuber For example `1/var(log(rpois(10^6,1234)+1))` seems to be close to $1234$ – Henry Jul 27 '21 at 22:41
  • Ah... I had understood from the question that it concerned sampling and therefore $n$ would naturally be the sample size. Although you clearly define it to be the Poisson mean, in this context that is confusing! – whuber Jul 29 '21 at 13:53
  • @whuber - my apologies if it was confusing - I had made it $n$ so I could use the argument of the sum of $n$ cases with parameter $1$ – Henry Jul 29 '21 at 14:18
  • @Henry - this is great. I have updated the question with some request for further clarification. Please ignore the bottom part of update as I now understand the difference would be that the log anscombe gives fold differences but z-score does not. – SysEng Aug 09 '21 at 12:43
  • @SysEng - Your edit suggests that variance stabilisation is not actually your objective. (Variance stabilisation is equivalent to standard deviation stabilisation). If you have a Poisson distribution with high mean then is it already close to normally distributed though restricted to integer values. If it has low mean, then nothing you can do will make it normally distributed. – Henry Aug 09 '21 at 14:34