Understanding variance stabilization and its uses

Question

I recently came across the variance stabilization method that tries to remove the dependency of variance from the mean (for example consider Poisson distribution). The method is mentioned as Approach 5 in this lecture.

However I fail to understand

The reason for variance stabilization in the first place
Are there any assumptions while applying this method
what are use cases or characteristics that a feature should have before variance stabilization

Any resources that can guide me in understanding this topic is also helpful.

BruceET · Accepted Answer · 2019-06-22T02:28:33.217

My comments here are restricted to the 'square-root transformation' part of the link and do not apply to (much more useful) 'Box-Cox transformations'. In particular I will discuss the use of square-root transformations with two-sample t tests.

Transformation for a specific example. Suppose you have two datasets of Poisson counts, and want to test whether the population Poisson means are equal. Here are two random Poisson samples of size 20 each, one from $\mathsf{Pois}(\lambda=10)$ and one from $\mathsf{Pois}(12).$

set.seed(2019)
x1 = rpois(20, 10);  x2 = rpois(20, 12)
sort(x1); sort(x2)

 [1]  4  4  5  7  7  7  8  9  9 10
[11] 11 11 11 12 12 12 12 14 14 17

 [1]  7  7  9 10 10 10 11 11 12 12
[11] 12 12 12 14 14 14 16 19 21 23

Because the population variances are numerically the same as the Poisson means, we expect that the sample variance of sample x1 may be smaller than that of x2. For our particular samples, this turns out to be true:

var(x1); var(x2)
[1] 12.06316
[1] 17.85263

There are theoretical reasons to believe that if we take square roots of the Poisson counts, then the variance of the transformed samples will be more nearly equal. That is also true for our data.

var(sqrt(x1)); var(sqrt(x2))
[1] 0.3376489
[1] 0.3218574

Pooled two-sample t test: Because the pooled two-sample t test requires population variances to be equal, then it seems reasonable that we would get better results from this t test by using the square-root transformed data.

Here are the P-values from the (questionable) pooled t test for original data (0.012) and from the test on transformed counts (0.016). [The pooled t test on the original data is questionable because population variances are unequal if the population means are unequal.]

In both cases, we would reject the null hypothesis that the Poisson means of the two populations are equal. (Because this is a simulation, we have the advantage of knowing that they are different, and so knowing that this is the correct decision; for real-life data, we would not know).

t.test(x1, x2, var.eq=T)$p.val
[1] 0.01887121
t.test(sqrt(x1), sqrt(x2), var.eq=T)$p.val
[1] 0.01643156

Welch t test: The Welch two-sample t test does not require equal population variances, so using that test for the original counts should be OK. [Notice that the parameter var.eq=T is missing from the t.test procedure here, so the default Welch test is performed.]

t.test(x1, x2)$p.val
[1] 0.01905609

Simulation: So for our data above, it makes no difference in the decision to reject whether we use the variance stabilizing transformation or not. The question remains how often this is so. We can settle that for specific instances with a simulation.

First, we see whether the true significance level of the pooled t test is 5% with and without the square root transformation. (With 100,000 iterations, we should round P-values to two places of accuracy.) The answer is Yes.

set.seed(2019)
n = 20;  lam1 = lam2 = 10
m = 10^5;  pv.o = pv.t = numeric(m)
for(i in 1:m) {
 x1 = rpois(n,lam1); x2 = rpois(n,lam2)
 pv.o[i] = t.test(x1, x2, var.eq=T)$p.val
 pv.t[i] = t.test(sqrt(x1), sqrt(x2), var.eq=T)$p.val
 }
mean(pv.o <= .05)
[1] 0.04964         # aprx 5%
mean(pv.t <= .05)
[1] 0.05001         # aprx 5%

A similar simulation with $\lambda_1 = 10$ and $\lambda_2 = 12$ uses the same code, but with lam1 = 10; lam2 = 12 at the start. this enables us to find the power of the pooled test (with and without transformation). Both results are close to 46% power.

mean(pv.o <= .05)
[1] 0.45945
mean(pv.t <= .05)
[1] 0.45664

For $\lambda_1 = 10, \lambda_2 = 14,$ both P-values are about 95%.

My conclusion is that for the specific situation investigated here, it makes no practical difference whether one uses the square-root transformation or not---even with the pooled test where a difference in variances might have caused trouble.

Notes: (a) Some years ago when teaching related topics I wanted to give my class a real example with Poisson data where using the square-root transformation made a difference in the analysis using pooled t tests. Failing to find a real-life example, I started using simulation to see if I could (with an appropriate confession) produce some example. For all the scenarios I tried (certainly more than are shown here and including some one-factor ANOVAs), I decided that the square-root transformation is a nice theoretical idea, but of very limited practical use.

Maybe someone on this site can find a convincing numerical example that supports the use of the the square-root transformation for t tests or one-factor ANOVAs.

(b) For more general discussions of uses of variance-stabilizing transformations of count data, search this site for 'square root transformation' or 'variance stabilizing transformation'. Some of the applications discussed there may be helpful for your work. To start, perhaps look here.

(c) in your link I'm wondering if there is a typo in specifying the 'arcsine' transformation for binomial data. I have always seen this explained in terms of the arcsine of the square root of binomial proportions, not binomial counts. (Otherwise, you're trying to take arcsines of quantities outside $(0,1).$

Why restrict your answer to the situation of two sample t-tests where it's easy to allow the variance to vary between the samples anyway. Isn't regression the more interesting case to discuss where heteroskedascity can impact the standard error estimates of the parameters? — jsk, Jun 22 '19 at 05:43
@jsk: Hoped I had made the restricted scope of my answer clear--right from the start. That seemed a good place to start elementary answer. OP's link is fragmentary and has at least one typo [my note (b)]. Gave a link to a more general treatment on our site, and encouragement to search further. No use to repeat what's already here. If you have other links to suggest or another answ to add, please do so. OP's interest seems to be mainly for Poisson. But do we have a good page on Box-Cox? — BruceET, Jun 22 '19 at 06:52
I would have started with a regression example since I think that is generally where variance stabilization is introduced. Interestingly, I went ahead and simulated Poisson data where the mean changed as a function of x, then used OLS to estimate the standard errors and coverage probability. Turns out that OLS coverage probability isn't really harmed by relatively small deviations from constant variance. I figured mean=variance would be enough heteroskedascity, but I was wrong! It is true though that OLS is no longer the BLUE (best linear unbiased estimator). — jsk, Jun 22 '19 at 21:10
You may be right. If you have a more helpful or applicable Answer in mind, please share it. — BruceET, Jun 22 '19 at 23:23
Thanks a lot. From what I can understand, variance stabilization can be used while using hypothetical tests which have an assumption about the variance of the distributions — RTM, Jul 09 '19 at 22:12

Understanding variance stabilization and its uses

1 Answers1