Why is bootstrapping p-values not just finding the null in the bootstrap samples?

Question

I've been trying to look into the possibility of plucking a p-value for a slightly tricky case from the bootstrap distribution that I'm generating to construct confidence intervals. Everything I'm reading, including here on StackExchange (e.g. https://stats.stackexchange.com/a/277391/54668) and elsewhere, talks about rejigging the data so that the bootstrap samples represent the distribution of the statistic under the null. This makes sense as a way forward. But what I don't understand is, why can we not just look at the percentile of the null hypothesis parameter value in the bootstrap samples we used to generate our CI? I know that is not what the bootstrap samples are supposed to model, but my thinking is as follows:

I understand that the sampling distribution (if the alternative is true) can be wildly different from that if the null is true and further that the bootstrap samples model the sampling distribution of the estimate... But... the following logic then leaves me confused. For simplicity of explanation, I'll talk about one-sided CIs and tests ($H_0: \mu=0, H_1: \mu > 0$). And I'll stick to percentile CIs for simplicity (suppose we can assume they will be good in this case):

Suppose the null ($\mu=0$) is true. For a $(1-x)\%$ CI, it will miss zero, $x$% of the time, giving $x\%$ falsely significant results (at the $x\%$ level).

Take a tiny $\epsilon$. If the $(x+\epsilon)$th bootstrap percentile lies just above our null (0), then our null (0) lies outside the CI, it is a false positive and statistically significant at the $(x+\epsilon)\%$ level, and $p<(x+\epsilon)$. Conversely, if the $(x-\epsilon)$th percentile lies just below zero, then $p>(x-\epsilon)$. So surely, for that $x$, $(x-\epsilon)<p=x<(x+\epsilon)$? I.e. $p=x$ is the percentile of the null in the bootstrap samples. Is there a gap in this logic? Or is there another reason why we need to do all this shifting around of data to recreate the null distribution?

score 5 · Accepted Answer · answered May 07 '20 at 12:23

5

Hi: I think what you're missing is that, for bootstrapping to work, not only does the distribution of the "thing" being bootstrapped have to converge to a distibution under the null but that "thing" has to be pivotal. By pivotal, it is meant that the statistic being bootstrapped doesn't depend on the parameter being tested under the null.

But, if we use the bootstrapped samples themselves, then, clearly, that's not true. If we generate samples from the original population, then the bootstrapped distribution of the sample clearly depends on the value of $\mu$.

The idea of bootstrapping is to be able to avoid a distributional assumption about the original sample by using the fact that the constructed pivotal statistic from the sample ( hopefully) converges to a distribution. This way, we can look at the resulting distribution of the pivotal statistic and see where the actual statistic from the original sample falls within that distribution. I hope that helps.

answered May 07 '20 at 12:23

mlofton

1,995
1
9
16

Thank you for your answer! I've given you an upvote and am not reading up on what it means to be pivotal. – justme May 07 '20 at 13:36
1

(That should say NOW reading up...!) – justme May 07 '20 at 14:44
OK, so I'm new to all this, so apologies if this is a stupid question. I *think* when our "thing" is pivotal, then percentile CIs are good? In that case, is that covered in the OP, by "suppose we can assume percentile CIs wil be good"? If so, can we not cover the more general case by inverting the BCa formulae to convert the percentile of our null in the BS distribution into a "desired alpha" for a CI that would just miss it -- and take that alpha as a p-value? -- possibly there is again something missing in my logic. – justme May 07 '20 at 16:42
(I ask because it would make a current job I'm working on so much easier if I could just invert a p-value from the bootstrap samples). – justme May 07 '20 at 16:44
Hi: I don't understand the comment that starts with OK but if you can generate the samples, then calculating the statistic should not be difficult. If you use the samples themselves as you described, that procedure is not correct. – mlofton May 07 '20 at 17:26
Sorry, can I try to rephrase the question... I *think* BCa intervals do not require that your "thing" be pivotal? If so, and I can reverse engineer which confidence level interval would just touch my null -- would that give me a valid p-value by the relationship between CIs and p-values? (This has certain set me in a useful direction, thank you! But I'm still struggling to see where the line is drawn.) – justme May 07 '20 at 20:07
Are you trying to bootstrap in order to build a CI ? If so, BCa only corrects for bias in the methodology but the empirical bootstrap distribution is still only valid if the constructed statistic ( on which he distribution is based ) is pivotal. Maybe if you explain why you are reluctant to use the standard approach, someone can help ? Note that sampling with replacement from the sample is analogous to sampling from the population. So, when you construct a statistic based on the bootstrap sample, you do not want it to depend on the unknown parameter of the distribution of the population. – mlofton May 08 '20 at 23:07
Firstly, thank you so much for the replies -- it is a nightmare trying to understand things when there is no statistics department at my institution and no one to ask on finer points!! I did ask a more specific question [here](https://stats.stackexchange.com/q/463928/54668), but the closest we could get was applying an ordinal mixed model to a DV with 69 levels, which isn't ideal (though it does work, given the large sample). I understand that the empirical distribution is only valid if the statistic is pivotal, but I *think* the CIs are still valid(ish) from BCa even if it is not? – justme May 09 '20 at 16:12
(If that is not the case, then I need to redo my CIs for this paper, too!! But, I've read over Efron's derivation for BCa, and I've stepped through the bootstrap in the `orddom` package in R, and they are doing the same as I am to bootstrap the exact same statistic (Vargha and Delaney's A -- which is definitely NOT pivotal), so I *think* I'm good...) That other StackExchange question is about trying to solve that problem. This one is really because I can't wrap my head around why I can't just invert BCa CIs to get a p-value -- because I hate not understanding what I'm doing :) – justme May 09 '20 at 16:16
(Though I get now (thanks to you!) that my initial question had the problem $\mu$ isn't pivotal, so the bootstrap distribution would be invalid. The reason I had stuck to percentiles of bootstrap distribution was to simplify my question, but ended up learning something (I think) orthogonal to what I wanted to learn (though v helpful). My broader question still eludes me -- why we can't just find the percent-confidence CI that barely excludes our null and infer a p-value from there. Unless *BCa* CIs are also invalid for non-pivotal statistics. But it feels like acceleration is for just that? – justme May 09 '20 at 16:31
1

@justme [This page](https://stats.stackexchange.com/q/355781/28500) describes different types of bootstrapped CI. BCa provides "second-order accurate" CI for non-pivotal statistics. Those CI values, however, are based on the data, without respect to a null hypothesis against which p-values are calculated. It's not that the CI are invalid for BCa, it's the _inversion_ from the CI to a null-hypothesis-based p-value that can't be trusted without a pivotal statistic. See [this page](https://stats.stackexchange.com/q/169141/28500) about such inversions. And with good CI, why do you need a p-value? – EdM May 09 '20 at 19:59
@EdM -- thank you this is really helpful!! The first one, I did read. The second, I'm not sure how I missed, though the "yes" is exactly what I had in mind, and the "no" (I think?) doesn't apply as even BCa CIs are (I think, might be wrong) nested. But I think I get what you're saying -- these CIs are built from the data, so can't be expected to invert in the same way that a CI based on a model for the population would? (I think that's what you're saying). – justme May 09 '20 at 21:33
@EdM The need for p-values, sadly, is publishing optics. For certain reasons, they wanted to put my name on the paper, so I asked if I could soup up the stats with some CIs because it was all p-value based, and they said, sure, but we need the p-values to get published. It's mostly between subjects, but there is one bit where the CIs really work nicely to incorporate a 50/50 between/within subjects bit, which the bootstrap handles elegantly, but I cannot for the life of me figure out any non-hacky way of getting a p-value to "complete" them. I might suggest we leave the p-values off those!! – justme May 09 '20 at 21:36
1

@justme I think you understand me well enough, although the distinction is really between a statistic based on sampling from the population you have and a statistic based on sampling from a hypothetical population _in which the null hypothesis is true_. I assume you certainly can say that the hypothesized null value lies outside the 95% confidence intervals based on the data, which should be good enough. I suppose you could go for more stringent CI (98%) and make a stronger point. – EdM May 09 '20 at 22:13
@EdM -- Indeed -- it even lies outside of a 99.9% CI and I'm quite sure it would lie outside of a larger CI, too (the Mann-Whitney, if we falsely assume iid, is p=10^-24)... Currently I have "p<.001 a="" about="" altogether...="" and="" approximate="" are="" as="" based="" but="" case="" ci.="" dagger="" exclusion="" feels="" first="" for="" from="" go="" have="" i="" if="" in="" invertible...="" is="" issue="" it="" like="" meditate="" need="" note="" now="" of="" on="" p-values="" place...="" population="" same="" say="" saying="" should="" stuck="" t-interval="" table="" that="" the="" think="" this="" to="" value="" was="" where="" with="" wonder="" you="" your=""> – justme May 09 '20 at 23:25
@justme: Assuming EdM is correct ( thanks EdM for chiming in. I bet you are correct, I just don't get it ) and you don't need "pivotalness" for that BCa methodology, then my apologies for providing wrong advice. I thought that the bootstrap always needed it but my understanding of bootstrapping must have some holes also. I can't follow what you and EdM are discussing but good luck with your project and I hope EdM's entrance into the discussion clears things up for you. – mlofton May 10 '20 at 16:46
@mlofton Your input has been extremely helpful!! And helped me to find more relevant material. I *think* BCa works by assuming an unknown transform that makes the statistic normal with SD linear in the param, then applies a $\log(1+x)$ transform further to *that* which then *finally* gives us something pivotal to work back from... – justme May 11 '20 at 11:26
... so my understanding of it at least does match with @EdM 's. More broadly I'm still confused...The best I can get is that my logic only gives that such a p would be uniform under the null, not that the p-values would be ordered in a useful way (small not necessarily being interesting)? And that could be chaotic given we are working back from infinitely many possible non-null distributions. It's confusing me further that I've just read (3.6) in [this paper](https://projecteuclid.org/download/pdf_1/euclid.ss/1032280214) where he does exactly what I'd imagined and says it's "quite accurate"... – justme May 11 '20 at 11:27
@justme: I sort of totally got lost when you and EdM's started brainstorming about BCa and I don't have time to try to un-confuse myself right now. I'm glad that I helped a little but, given where you two are in the discussion, I think that I can't help at this point. I think if Edm takes a look at your link, he will have something useful to say. All the best. – mlofton May 12 '20 at 13:25
@mlofton you've helped a lot. I've accepted your answer as it is the correct to answer to the specific question that I posed. It was down to my lack of understanding that I posed it clumsily so it didn't get to at the answer I needed. – justme May 13 '20 at 15:42
Thanks. I'm glad that I helped some. Someday, I'd like to come back to this and understand the discussion between you and EdM. I can't right now because it looks complicated and would take too much time given what I have available at the moment. I hope that you worked out or are getting near a solution to your problem. – mlofton May 14 '20 at 16:14

Why is bootstrapping p-values not just finding the null in the bootstrap samples?

1 Answers1