1

I have a question about finding a pvalue on bootstrapped data, which is very similar to here.

I have a numeric vector S of bootstrapped simulated data, for instance N values between 0 and 1, and one observed numeric data X, and I want to compute a pvalue which indicates if the observed value is more likely in the upper or lower tail (so, if I understand well, if the value X is significantly lower or higher than the null distribution).

I found the simple formula for the pvalue which computes the chances for x to be in the upper tail :

p = sum(S > X)/N and it's corrected one : p = (sum(S >X)+1)/(N+1). So if p > 0.05, X is most likely not in the upper tail. According to an article I found (here, on page 2), this pvalue can be seen as the proportion of boostrapped values which are more extreme than the observed value.

My questions (probably a bit silly) are :

  • is the equivalent formula for the lower tail p = sum(S<X)/N ? (edit : according to the article I'm reading, yes)
  • is there a formula for a two-sided pvalue ? because if I add the result of the two previous formula, I'll get 1 of course.
  • Like mentionned in the question linked above, is it correct to use the Z-table (assuming my boostrapped data are normally distributed) ?

Thanks in advance.

Micawber
  • 123
  • 4

1 Answers1

1

General: please read the answer of @whuber in the question that you are linking.

Ad 1: yes

Ad 2: I think it is more complicated.

If S follows a distribution which is symmetric around 0, then you can simply use (abs(S) > abs(X))/N

However, if I did not know anything about the distribution of S, then I would take the pairs quantiles of the experimental S distribution and ask whether X is within them. For example, if

X < quantile(S, 0.005) | X > quantile(S, 0.995)

then I would say that p < 0.01 in a two-tailed comparison.

Ad 3. I think the answer is no, since you are calculating your z scores based on mean which is calculated from the samples which undergo the test.

January
  • 6,999
  • 1
  • 32
  • 55