2

I have a small sample from an unknown, possibly non-parametric population. I would like to create a new, different sample from the population based only on what I can extract from the small sample I have.

Is there a way to do it, given that my sample is small? Specifically, can I resample from that sample using bootstrapping and hope I can get closer to the "true" mean and variance of the population? (Let's assume for a moment these two parameters are enough to characterize the population.)

gung - Reinstate Monica
  • 132,789
  • 81
  • 357
  • 650
Noale
  • 171
  • 1
  • 1
  • 5
  • 3
    Populations aren't non-parametric (or parametric); rather statistical procedures are classified that way. – Nick Cox Aug 30 '17 at 10:30
  • 2
    Bootstrapping won't guarantee that you get 'closer' to the true population mean and variance. But it can be used to construct (non-parametric) confidence intervals for the mean and variance, which allows you to quantify your uncertainty about those quantities. – MånsT Aug 30 '17 at 10:32
  • Thanks Nick. What I meant was, if I use bootstapping to resample, will it indeed get me closer to the actual population mean and variance, although my basis to do this is a single small sample which doesnt necessarily represent the population? – Noale Aug 30 '17 at 10:32
  • 1
    Others have already addressed the misconception that bootstrapping can get you closer. (How would that work? Random numbers used in selection can't possibly know which values should be chosen. Also, a bootstrap sample can't reach into parts of the population not already sampled.) – Nick Cox Aug 30 '17 at 10:35
  • 2
    @ the OP and @NickCox, this other question might show a good example of what can go wrong in (small sample) bootstrapping: https://stats.stackexchange.com/questions/256505/strange-pattern-in-standard-deviation-confidence-interval-estimation-via-bootstr/256514#256514. Additionally the OP could think whether this answers the question, or whether it is a duplicate (aside from the correct answer by kjetil b halvorsen). – IWS Aug 30 '17 at 11:03
  • @IWS Thanks for reminding me of that thread, which is certainly salutary (and we exchanged thoughts there too). I don't think this is quite a duplicate, but there's useful overlap. – Nick Cox Aug 30 '17 at 11:09
  • There is enough information here for an answer to be given, which may have some value for the OP. (Moreover, @kjetilbhalvorsen's answer is both correct & covers all that can be said.) I'm voting to leave open. – gung - Reinstate Monica Aug 30 '17 at 15:22
  • Smaller the sample then smaller is the utility of bootstrapping – Aksakal Aug 30 '17 at 15:29

2 Answers2

5

No, you cannot. Bootstrap isn't magic: it cannot create new information. If you want/need a new, different sample from the population the only way to get that is to sample from the population!

Bootstrapping and resampling is a way to analyze the information in your sample. Its grounding, apart from its intuitive appeal, is in large sample theory, that is, approximations based on a large sample size. So if your sample is very small, it might be that bootstrapping is not a good way to analyze it. You haven't given us enough details and context to say much more.

Nick Cox
  • 48,377
  • 8
  • 110
  • 156
kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
  • So, the idea is that all I have is a small sample of my population, and I would like to know what information I can learn about the population. Let's say I would like to discover if I have a fair coin, based on only 5 tosses. Is it even possible? – Noale Aug 30 '17 at 10:41
  • If you really doubt the coin is fair, NO, 5 throws will not be enough to test the null hypothesis $H_0 \colon p=0.5$. See https://stats.stackexchange.com/questions/5566/testing-if-a-coin-is-fair – kjetil b halvorsen Aug 30 '17 at 10:55
  • If all 5 tosses came as tails, then hell yeah! I'll say your coin's not fair with certainty beyond any reasonable doubt. It never happens. The reasonable doubt is defined as 95% confidence :) – Aksakal Aug 30 '17 at 15:31
  • @Aksakal: Maybe, we dont really know enough of OP's context. – kjetil b halvorsen Aug 30 '17 at 15:32
0

What bootstrapping CAN give you an estimate of is, for example, how much your estimated mean would vary across samples of the same size as you have. This is however not the same thing as the variance in your data (variance of the mean vs variance within between samples). Especially in the case of the mean, bootstrapping won't generally do anything at all, you can take as many bootstrap samples as you want, the mean won't change in the expectation.

Sam
  • 758
  • 3
  • 14