0

While studying machine learning I read about different sampling methods. Simple holdout, N-fold cross validation are straightforward. However, I somehow miss the point of bootstrapping. Its definition says that it's just a way to inflate the sample set simply by duplicating some random samples and I cannot figure what is the point in this -- seemingly no additional information in a learning process just by seeing the same instances again and again (on the contrary: others say that omitting redundant points from the training set is recommended for computational efficiency and for some other statistical reason as well).

So what is the explanation here?

Fredrik
  • 671
  • 1
  • 5
  • 8
  • But you're not seeing the same samples again and again. You're taking a different sample each time. With bootstrapping, you're taking a simple random sample *with replacement* from the original sample. You end up with many different samples. For example, if your original sample was $S={1,2,3}$ some bootstrap samples might be: $S_1^*={1,1,1}$, $S_2^*={1,2,3}$, $S_3^*={3,2,3}$, $S_2^*={2,2,3}$, etc. – StatsStudent Dec 11 '20 at 08:30
  • But in case of S1 = 1, 1, 1 I still show my estimator the same 1 instance three times. It's pointless. – Fredrik Dec 11 '20 at 08:50
  • 1
    If you want to sample from a distribution defined in terms of samples, redundant samples are equivalent to changing the weighs, or importance, of those redundant samples. – hakanc Dec 11 '20 at 08:55
  • I see, but in bootstrapping redundant samples are choosen randomly, which in this sense means that we assign weights randomly for certain points. Why, and on what basis? This also seems irrational. – Fredrik Dec 11 '20 at 09:03
  • The distribution $S_1 = \{a,a,b\}$ with equal probability for all elements and the distribution $S_2 = \{a,b\}$ with $p_a = 2/3$ and $p_b = 1/3$ are equivalent when bootstrapping. – hakanc Dec 11 '20 at 09:08
  • My answer at https://stats.stackexchange.com/a/290855/919 might supplement the duplicates by providing additional insight into the theoretical characterization of resampling. – whuber Dec 11 '20 at 12:49

0 Answers0