Questions tagged [resampling]

Resampling is taking a sample from a sample. Common uses are jackknifing (taking a subsample, eg all values but 1) & bootstrapping (sampling w/ replacement). These techniques can provide a robust estimate of a sampling distribution when it would be difficult or impossible to derive analytically.

Resampling is taking a sample from a sample. Common uses are jackknifing (taking a subsample, e.g., all values but one), and bootstrapping (sampling with replacement). These techniques can provide a robust estimate of a sampling distribution when it would be difficult or impossible to derive analytically.

Resampling methods can help with providing confidence intervals and performing statistical inference without assuming a known probability distribution for the data.

Reference: Statistical Methods by Arnaud Delorme.

372 questions
89
votes
2 answers

Resampling / simulation methods: monte carlo, bootstrapping, jackknifing, cross-validation, randomization tests, and permutation tests

I am trying to understand difference between different resampling methods (Monte Carlo simulation, parametric bootstrapping, non-parametric bootstrapping, jackknifing, cross-validation, randomization tests, and permutation tests) and their…
Ram Sharma
  • 2,226
  • 3
  • 20
  • 24
38
votes
2 answers

Why use stratified cross validation? Why does this not damage variance related benefit?

I've been told that is beneficial to use stratified cross validation especially when response classes are unbalanced. If one purpose of cross-validation is to help account for the randomness of our original training data sample, surely making each…
James Owers
  • 627
  • 1
  • 5
  • 11
35
votes
5 answers

Can you overfit by training machine learning algorithms using CV/Bootstrap?

This question may well be too open-ended to get a definitive answer, but hopefully not. Machine learning algorithms, such as SVM, GBM, Random Forest etc, generally have some free parameters that, beyond some rule of thumb guidance, need to be tuned…
29
votes
2 answers

How well does bootstrapping approximate the sampling distribution of an estimator?

Having recently studied bootstrap, I came up with a conceptual question that still puzzles me: You have a population, and you want to know a population attribute, i.e. $\theta=g(P)$, where I use $P$ to represent population. This $\theta$ could be…
KevinKim
  • 6,347
  • 4
  • 21
  • 35
26
votes
2 answers

Testing Classification on Oversampled Imbalance Data

I am working on severely imbalanced data. In literature, several methods are used to re-balance the data using re-sampling (over- or under-sampling). Two good approaches are: SMOTE: Synthetic Minority Over-sampling TEchnique (SMOTE) ADASYN:…
26
votes
1 answer

What are the assumptions of the permutation test?

It's often stated that permutation tests have no assumptions, however this is certainly not true. For example if my samples are somehow correlated, I can imagine that permuting their labels would not be the correct thing to do. Only think I found…
rep_ho
  • 6,036
  • 1
  • 22
  • 44
23
votes
2 answers

Caret re-sampling methods

I am using the library caret in R to test various modelling procedures. The trainControl object allows one to specify a re-sampling method. The methods are described in the documentation section 2.3 and include: boot, boot632, cv, LOOCV, LGOCV,…
Ram Ahluwalia
  • 3,003
  • 6
  • 27
  • 38
22
votes
2 answers

Test for IID sampling

How would you test or check that sampling is IID (Independent and Identically Distributed)? Note that I do not mean Gaussian and Identically Distributed, just IID. And idea that comes to my mind is to repeatedly split the sample in two sub-samples…
17
votes
2 answers

What is the procedure for "bootstrap validation" (a.k.a. "resampling cross-validation")?

"Bootstrap validation"/"resampling cross-validation" is new to me, but was discussed by the answer to this question. I gather it involves 2 types of data: the real data and simulated data, where a given set of simulated data is generated from the…
Mike Lawrence
  • 12,691
  • 8
  • 40
  • 65
17
votes
2 answers

Best suggested textbooks on Bootstrap resampling?

I just wanted to ask which are in your opinion the best available books on bootstrap out there. By this I don't necessarily only mean the one written by its developers. Could you please indicate which textbook is according to you the best for…
16
votes
1 answer

Bootstrap methodology. Why resample "with replacement" instead of random subsampling?

The bootstrap method has seen a great diffusion in the last years, I also use it a lot, especially because the reasoning behind is quite intuitive. But that's one thing I don't understand. Why Efron chose to perform resample with replace instead of…
Bakaburg
  • 2,293
  • 3
  • 21
  • 30
15
votes
3 answers

Why is bootstrapping useful?

If all you are doing is re-sampling from the empirical distribution, why not just study the empirical distribution? For example instead of studying the variability by repeated sampling, why not just quantify the variability from the empirical…
14
votes
1 answer

Why not always use bootstrap CIs?

I was wondering how bootstrap CIs (and BCa in barticular) perform on normally-distributed data. There seems to be lots of work examining their performance on various types of distributions, but could not find anything on normally-distributed data.…
14
votes
1 answer

Why is the jackknife less computationally intensive than the bootstrap?

It's often claimed that the jackknife is less computationally intensive. How is that the case? My understanding is that the jackknife involves the following steps: Remove 1 data point Estimate the statistic of interest (e.g. sample mean) on the…
Heisenberg
  • 4,239
  • 3
  • 23
  • 54
14
votes
1 answer

Is this method of resampling time-series known in the literature? Does it have a name?

I was recently looking for ways to resample time series, in ways that Approximately preserve the auto-correlation of long memory processes. Preserve the domain of the observations (for instance a resampled times series of integers is still a times…
gui11aume
  • 13,383
  • 2
  • 44
  • 89
1
2 3
24 25