Subsampling is a resampling procedure akin to the bootstrap in which fewer than all observations are being drawn with replacement (vs. the original sample size used in the textbook bootstrap method). For creating samples out of your existing data, please consider "sampling" tag instead.
Questions tagged [subsampling]
66 questions
16
votes
1 answer
Bootstrap methodology. Why resample "with replacement" instead of random subsampling?
The bootstrap method has seen a great diffusion in the last years, I also use it a lot, especially because the reasoning behind is quite intuitive.
But that's one thing I don't understand. Why Efron chose to perform resample with replace instead of…

Bakaburg
- 2,293
- 3
- 21
- 30
11
votes
1 answer
Chance that bootstrap sample is exactly the same as the original sample
Just want to check some reasoning.
If my original sample is of size $n$ and I bootstrap it, then my thought process is as follows:
$\frac{1}{n}$ is the chance of any observation drawn from the original sample. To ensure the next draw is not the…

Jayant.M
- 385
- 1
- 8
7
votes
4 answers
Does sampling from a large dataset lead to correct inferences?
Say we have some population, and we obtain a "representative" random sample of that population, $(y_i, x_i)_{i = 1}^n$, where $n$ is very large (millions) and $x_i = (x_{i1}, x_{i2}, ... x_{ip})'$ is a multivariate predictor of the response…

Marcel
- 1,200
- 6
- 24
6
votes
1 answer
Is it good practice to perform model parameter tuning on a random subsampling of a large dataset?
A lot of the datasets presented to us in the company at which I'm currently an intern are very large (many millions of rows / Gigabytes, or even Terabytes of data).
While running machine learning experiments, I find myself wanting to use (cross…

TBZ92
- 163
- 3
5
votes
1 answer
Intuition behind m-out-of-n bootstrap
I am trying to get some intuition on why m-out-of-n bootstrap works but haven't been able to find good explanation. I would really appreciate any input on this.
I think I do understand what bootstrap is about -- estimating how…

RevealedPreference
- 90
- 5
5
votes
1 answer
What is a good introductory text on resampling methods?
I have found a few decent ones about specific resampling applications such as bootstrapped confidence intervals, but nothing broader. A journal article or book chapter would be preferable to an entire book, but all recommendations are welcome.…

Jeffrey Girard
- 3,922
- 1
- 13
- 36
5
votes
1 answer
What is the effect of using survey sample weights for a sub-sample?
If a sub-sample of the survey sample, selected based on certain demographic characteristics of the data (e.g. age, race etc.), is used, which means the sub-sample might not be representative of the population anymore, is it better to not use…

tvl
- 61
- 6
3
votes
0 answers
Finding a sub-population from dataset matching another target dataset
Let's say one has a finite collection of i.i.d. samples from an unknown source distribution $S=\{x_{i} | i \in [1,n_{S}], x_{i} \sim p_{X_{S}}(x)\}$. Where each $x$ is multidimensional and has continuous and discrete components.
One also has…

jeandut
- 136
- 5
- 15
3
votes
0 answers
Is there a resampling method that blends subsampling with the bootstrap?
I apologize if this is an inappropriate question. I thought of it in class the other day, and I couldn't find a specific answer in my textbooks.
I am familiar with the two basic techniques for resampling data:
Subsampling - drawing m observations…

Cat C.
- 31
- 3
2
votes
1 answer
Subsample analysis based on country-level indices?
In a generalized Difference-in-Difference setting from Dasgupta,2019 for multiple event dates (laws staggered implementation)
The baseline equation:
$Y_{it}$ = $\alpha$ + $\beta$ $(Leniency Law)_{kt}$ + $\delta$$X_{ikt}$ + $\theta$$_t$ +…

Louise
- 97
- 1
- 16
2
votes
0 answers
Subsampling the "right" amout of data to train an ML model
I am training a machine learning model (i.e., a classifier) on a large dataset. I know that I can get the same results using less data (about 30%) but I would like to avoid the trial and error process to find the 'right' amount of data to retain…

giz
- 21
- 2
2
votes
1 answer
Do both Bootstrap with and without replacement create a distribution?
I'm having a "noisy debate" with colleagues about whether sampling without replacement can still create a distribution.
Methodology:
A bootstrap (iterative process where I calculate Somers' D for new samples) is done with and without replacement.
I…

user235111
- 35
- 3
2
votes
1 answer
How to: Normal sub-sampling out of a uniformly distributed data samples
Given a uniformly distributed sample of data, It's needed to sub-sample out the points in a Normal distribution fashion, i.e. more around mean and sparser as we move out. What could be the steps?

Saransh
- 23
- 4
2
votes
1 answer
Subsampling to determine a standard error, how does it work?
I need to calculate the standard error on a complicated dataset (> 1700 records) which uses genetic matching. Using bootstrap results in very high computation time (because of the genetic matching).
My professor gives subsampling as an…

dietervdf
- 1,132
- 1
- 9
- 20
2
votes
0 answers
Formal method to find optimal sub-sample size from large sample for multiple regression
I have labour market data for 9 million observations, for a single time period (i.e cross-section data). I am studying the determinant of wages in a single equation multiple regression with around 300 regressors. If $Y$ is the vector of wages (9…

luchonacho
- 2,568
- 3
- 21
- 38