I have a question about finding a pvalue on bootstrapped data, which is very similar to here.
I have a numeric vector S
of bootstrapped simulated data, for instance N
values between 0 and 1, and one observed numeric data X
, and I want to compute a pvalue which indicates if the observed value is more likely in the upper or lower tail (so, if I understand well, if the value X
is significantly lower or higher than the null distribution).
I found the simple formula for the pvalue which computes the chances for x to be in the upper tail :
p = sum(S > X)/N
and it's corrected one : p = (sum(S >X)+1)/(N+1)
. So if p > 0.05
, X
is most likely not in the upper tail. According to an article I found (here, on page 2), this pvalue can be seen as the proportion of boostrapped values which are more extreme than the observed value.
My questions (probably a bit silly) are :
- is the equivalent formula for the lower tail
p = sum(S<X)/N
? (edit : according to the article I'm reading, yes) - is there a formula for a two-sided pvalue ? because if I add the result of the two previous formula, I'll get
1
of course. - Like mentionned in the question linked above, is it correct to use the Z-table (assuming my boostrapped data are normally distributed) ?
Thanks in advance.