4

Why do we need to resample from an initial set of samples when using bootstrapping? Why don't we just take fresh sets of samples from the original distribution? What is the justification behind resampling? Or is it just a computational trick?

EDIT: I understand that it is usually very expensive to get fresh samples, but assuming we have access to a generator of the original distribution, does it still make sense to resample due to some theoretical reasons?

Vivek
  • 146
  • 1
  • 8
  • 3
    In life, sampling from the true distribution is expensive, impossible. It's the money. https://www.youtube.com/watch?v=s2VG53RIJ50 – Matthew Drury Jan 18 '16 at 17:04
  • 4
    Fresh samples would presumably mean using other similar datasets, and that would often be a very good idea. The problem is that often there is little or no scope for collecting more data, at least with available resources. (You wouldn't relaunch the _Titanic_ and crash it into another iceberg.) So the idea is to use the present dataset itself to tell you about variability. And no, it's not just a computational trick. – Nick Cox Jan 18 '16 at 17:07
  • The question is good but I think we need to be clear that it isn't already answered. – Nick Cox Jan 18 '16 at 17:12
  • @StephanKolassa Done. – Matthew Drury Jan 18 '16 at 17:12
  • 1
    @NickCox: I don't think [this](http://stats.stackexchange.com/questions/26088/explaining-to-laypeople-why-bootstrapping-works) is a duplicate. Yes, the answers address why you wouldn't simply sample again... but only in passing. I searched a bit for "why resample is:q" and similar and came up empty-handed. – Stephan Kolassa Jan 18 '16 at 17:14
  • Updated the question. @NickCox My question is more about in what ways is it more than a computational trick? – Vivek Jan 18 '16 at 17:39
  • I think it would be better to add to an existing strong thread than to start a new one. Our problems in middle age are getting bigger and splitting up! – Nick Cox Jan 18 '16 at 17:40
  • I remain unconvinced that the question as edited is genuinely new (or, a different matter, precise and focused enough to be answered with reasonable effort). – Nick Cox Jan 18 '16 at 17:42
  • 3
    I read the first few pages of the Bootstrap paper (http://projecteuclid.org/download/pdf_1/euclid.aos/1176344552 ) , now it makes more sense. The problem bootstrap is trying to solve is when you don't have access to the "generator" of the samples and just have one set of samples. Thanks – Vivek Jan 18 '16 at 17:54

1 Answers1

5

In real life, sample points have real costs, so sampling from the true distribution is, at best, expensive, and at worst, impossible. So, it's the money. The bootstrap is mostly free.

As @NickCox puts it with exceptional visual aplomb:

You wouldn't relaunch the Titanic and crash it into another iceberg

Because, you know, that would be expensive (in many, many ways).

I understand that it is usually very expensive to get fresh samples, but assuming we have access to a generator of the original distribution, does it still make sense to re-sample due to some theoretical reasons?

No. If it's free to sample from the population distribution, you should do so.

Matthew Drury
  • 33,314
  • 2
  • 101
  • 132