1

I have a silly question on bootstrap methods for calculating p-value.

Suppose I have a dataset, which is likely normal distributed. Now I want to determine whether a particular value X is signifiant high in the dataset by giving a pvalue.

For me, there are two options

  1. since the dataset is roughly normal distributed, I could calculate the sd and mean, get the area for z>(X-mean)/sd from Z table, and use it as pvalue.

  2. by bootstrap. I could generate a new sample S (length N) from the original dataset by bootstrap, then get the p-value by sum(S>X)/ N.

My question is am I right to use bootstrap in 2nd option? Could you recommend books for better understanding bootstrap and permutation test?

Thanks.

ccshao
  • 597
  • 2
  • 8
  • 14
  • what do you mean by "significant high" ? Do you want to check whether X is an outlier ? – mlwida Oct 23 '12 at 11:11
  • No, I want to give pvalue for data which is equal or higher than X. – ccshao Oct 23 '12 at 11:42
  • 2
    Unless I misread it, the question sounds like an effort to detect outliers. The bootstrap doesn't apply here and, besides, it's fruitless. Even the *maximum* value in the dataset is going to occur in $(1 - ((n-1)/n)^n)\approx 1-1/e\approx 0.63$ of the bootstrap samples, which is far too high to give you any chance at achieving a low p value. For the books, see http://stats.stackexchange.com/questions/5845, http://stats.stackexchange.com/questions/15692, and http://stats.stackexchange.com/questions/25151. – whuber Oct 23 '12 at 15:19

0 Answers0