How is approximately unbiased bootstrap better than a regular bootstrap with regards to hierarchical clustering?

Question

I asked this question at BioStar but did not get a reply, so Im posting the question here.

What is a simple explanation of what an approximately unbiased bootstrap is with regards to hierarchical clustering?

From what I read, it alters the sample size during randomization to calculate p-values.

How is this approach better than the regular bootstrap which keeps the sample size intact while randomizing and also is it randomization with replacement?

Edit: There is an R package pvclust that calculates p-value and approximately unbiased p-value. My apologies for being unclear as I thought this was due to a difference in the bootstrap method. Thanks for all the answers and comments!

Welcome to the site, @helen. Providing references would have helped. My gut feeling is that you are referring to some highly specific forms of the bootstrap developed in a small niche of the bootstrap literature, so more guidance may be needed for simple folks like us. Michael Chernick might be able to answer this off the top of his head, but I would personally like to see more of the explanation of the "approximately unbiased" bootstrap does (as well as your understanding of what the "regular" bootstrap does, as to me, there is no such thing as the regular bootstrap.) — StasK, Jul 06 '12 at 00:25
"Approximately unbiased bootstrap" is a *phrase,* not a technique. It shows up, *e.g.*, in a [recent paper by Wang *et al* in Nature.](http://www.eeb.yale.edu/downloads/Wang%20et%20al%202011nature10391.pdf). The description is obscure and without references: "After bootstrap re-sampling (10,000 iterations), approximately unbiased (AU) P value is provided in the output figure. AUP value, which is calculated by multiscale bootstrap re-sampling, is a better approximation to unbiased P value than bootstrap probability value calculated by ordinary bootstrap re-sampling." — whuber, Jul 06 '12 at 16:23
My answer was a guess that it refers to the m-out-of-n bootstrap because that is the only bootstrap variant that I knwo that sounds like what you describe. There are other variants called the wild bootstrap and the Bayesian bootstrap. But they don't seem to fit. Also it is puzzling that hypotehsis testing and p-values enter in the discussion since that would not seem to relate to descriptive methods like clustering but would be likely to arise in classification error rate estimation. — Michael R. Chernick, Jul 06 '12 at 16:25
@whuber I read the article and saw the reference to approximately unbiased bootstraps referring to a figure. But I could not find any more details about it. Did you get the rest of the information from the referenced online article? — Michael R. Chernick, Jul 06 '12 at 18:43
@Michael This was about the only credible hit *on the entire Internet* when searching for "approximately unbiased bootstrap" (after excluding the hits for here and the BioStar site!). Now you know as much as (and probably more than) the rest of us... . — whuber, Jul 06 '12 at 18:45

Michael R. Chernick · Answer 1 · 2012-07-06T18:47:01.480

Helen: I am the author of books on the bootstrap but I don't know what you mean by the term approximately unbiased bootstrap. As a guess based on your description that you may be talking about the $m$ out of $n$ bootstrap. The $m$ out of $n$ bootstrap takes an original sample of size $n$ and samples $m$ times with replacement from that sample where $m < n$. Each sample of size $m$ is an $m$ out of $n$ bootstrap sample. Most of the time the ordinary bootstrap provides consistent estimates for the parameter but there are situations where it fails to be consistent. In those cases the $m$ out of $n$ bootstrap is often consistent as long as $m$ approaches infinity at a slower rate than $n$. One example is the estimate of a population mean when the distribution of the samples does not have a finite variance. Such results have been proven in papers by Peter Bickel and his coauthors.

Bill Huber has shown that my guess was wrong. It appears that in the paper they are referring to a p-value estimate that is determined by bootstrap and they happen to choose the modifier "approximately unbiased". But it is not a variant of the bootstrap.

m-out-of-n bootstrap is also commonly referred to as subsampling, as the title of the book by Politis, Romano and Wolf goes. — StasK, Jul 06 '12 at 01:17
That is not quite right. Subsampling is different. It is another technique in the calss called resampling procedures but it does not involve sampling with replacement from the original sample. — Michael R. Chernick, Jul 06 '12 at 01:55

EdM · Answer 2 · 2019-06-14T17:07:50.043

This "approximately unbiased test" is the "multiscale bootstrap" reported by Shimodaira in 2002, as another answer noted. The short answer is that simple bootstrapping does not always correctly determine the probability that a multi-dimensional observation came from a particular region of interest, as the shapes of boundaries between regions can affect simple bootstrap-estimated results. The multiscale bootstrap is one solution to this problem.

A motivating scientific issue was how to choose among competing phylogenetic trees of organisms, based for example on differences in DNA sequences. With only 4 possible bases (T, C, G, A) at each position along a DNA sequence, a simple model of conversion probabilities over time for each pair of bases allows for calculating the likelihood of a particular phylogenetic tree, given the DNA sequences of the various species in question.

The question is whether phylogenetic tree $i$ with log-likelihood $Y_i$ has the largest expected value of log-likelihood ($\mu_i$) among all $M$ trees under consideration. The vector $Y$ containing log-likelihood values for all $M$ trees is distributed around the parameter vector $\mu$ of corresponding expected log-likelihood values. That is, the hypothesis $H_i$

$$H_i : \mu_i \ge \mu_j, j=1,...,M,$$

represents a region in the $M$-dimensional parameter space $\mu$ having observed value $Y$.

This is the problem (Shimodaira 2002, p. 495):

Considering that $Y$ is distributed around $\mu$, one might believe that the hypothesis $\mu \in H_i$ is probable when the event $Y \in H_i$ is observed. There is, however, the possibility that $Y \in H_i$ by chance, even though $\mu \in H_j$ for some $j \neq i$. In other words, the selected tree with the largest $Y_i$ value is not necessarily the tree with the largest $\mu_i$ value.

This is a particular example of The Problem of Regions discussed by Efron and Tibshirani in 1998, following up in part on earlier work by Efron et al on bootstrapping phylogenetic trees. Even in a simple 2-region case, the probability of $Y \in H_i$ as estimated by bootstrapping from a sample that provided the maximum observed $Y_i$ (called the "confidence value," $\tilde\alpha$, by Efron and Tibshirani) is not the same as the confidence level $\hat\alpha$ with respect to a null hypothesis, $P_{1-\hat\alpha}(Y \notin H_i)$, unless the boundary between the regions is flat. Their Figure 2, copied below, shows the situation for a smooth curved boundary and sampling from a $K$-variate normal distribution with unknown means and identity covariance matrix:

The null hypothesis in this example is $\mu \in R_1$. Testing the probability that the observed $y=\hat\mu$ might have resulted from that null hypothesis requires knowing distances between $y=\hat\mu$ and the $\mu \in R_1$, information that is not sufficiently provided by simple bootstrapping from the sample unless the boundary between the regions is flat. In the example above with a boundary curved away from the observed value, many $\mu \in R_1$ are generally far from $y=\hat\mu$ so they are less likely to have contributed to finding the observed value by chance. A boundary curved toward $y=\hat\mu$, in contrast, would increase the chance of having found that value under the null hypothesis.

Efron and Tibshirani showed that a second bootstrap around the boundary point $\hat\mu_0$ closest to the observed point $y=\hat\mu$ provides a good approximate correction for curvature of the boundary, which combined with the "confidence value" $\tilde\alpha$ from the initial bootstrap gives a bootstrap-based estimate of the confidence level $\hat\alpha$.

Shimodaira found a different way to correct for boundary curvature, one that did not require finding the closest boundary point $\hat\mu_0$. He showed that taking bootstrapped re-samples with replacement of a size $N_s$ smaller or greater than the size $N$ of the data sample could serve the same purpose. As a different re-sample size necessarily increased or decreased the size of the corresponding bootstrap-estimated "confidence regions" around the observed $y$, it would also sample more or less from within the region $R$ of the null hypothesis, depending on the curvature of the boundary. This is illustrated in the following Figure from a later paper on the topic:

Here, $\tau^2 = N/N_s$. Shimodaira showed that coefficients of a simple nonlinear regression of bootstrapped $z$-values against $\tau$ and $1/\tau$ for a set of $\tau$ values both above and below 1 provide a correction for boundary curvature similar to that proposed by Efron and Tibshirani. This approach is also suggested to work reasonably well in situations with non-smooth boundaries.

The pvclust package in R implements this multiscale bootstrap test for hierarchical trees in general.

score 2 · Answer 3 · answered Nov 11 '13 at 17:56

The pvclust package calculates p-values for a very specific test in phylogenetics, called the Approximately Unbiased test (Shimodaira, H. An approximately unbiased test of phylogenetic tree selection. Syst. Biol. , 51, 492-508, 2002). This test uses multi-scale bootstrapping of the site-likelihoods obtained from fitting molecular or other data to a set of topologies, in order to delineate the subset of topologies that are consistent with the data.

score 0 · Answer 4 · answered Jul 06 '12 at 06:44

0

Bootstrap is a technique used in classification, not so much in cluster analysis. It is meant to prevent overfitting on the class labels, but as the class labels are not used in cluster analysis, this is not possible anyway.

In cluster analysis, it can mostly serve as a randomization/diversity source; drawing different samples and clustering them will produce a set of different clusterings, out of which you can then choose one that suits you best.

answered Jul 06 '12 at 06:44

Has QUIT--Anony-Mousse

39,639
7
61
96

to clarify your point that the bootstrap is meant to prevent overfitting in classification, what the bootstrap is used for mostly in classification is getting "honest" (nearly unbiased) estimates of the classification error rate. If you have accurate unbiased estimates of the error rates you will not pick a model that overfits. Other methods might if they provide optimistically biased estimates of error rate such as in the case of the resubstitution estimate. – Michael R. Chernick Jul 06 '12 at 11:50

How is approximately unbiased bootstrap better than a regular bootstrap with regards to hierarchical clustering?

4 Answers4

Linked