Highest Voted 'subsampling' Questions - Statistical Analysis Stack Exchange

16

votes

1 answer

Bootstrap methodology. Why resample "with replacement" instead of random subsampling?

The bootstrap method has seen a great diffusion in the last years, I also use it a lot, especially because the reasoning behind is quite intuitive. But that's one thing I don't understand. Why Efron chose to perform resample with replace instead of…

bootstrap resampling subsampling

asked Sep 07 '15 at 13:36

Bakaburg

2,293
3
21
30

11

votes

1 answer

Chance that bootstrap sample is exactly the same as the original sample

Just want to check some reasoning. If my original sample is of size $n$ and I bootstrap it, then my thought process is as follows: $\frac{1}{n}$ is the chance of any observation drawn from the original sample. To ensure the next draw is not the…

sampling bootstrap sample-size subsampling

asked Jan 23 '17 at 01:50

Jayant.M

385
1
8

7

votes

4 answers

Does sampling from a large dataset lead to correct inferences?

Say we have some population, and we obtain a "representative" random sample of that population, $(y_i, x_i)_{i = 1}^n$, where $n$ is very large (millions) and $x_i = (x_{i1}, x_{i2}, ... x_{ip})'$ is a multivariate predictor of the response…

regression inference dataset large-data subsampling

asked Jan 15 '19 at 20:31

Marcel

1,200
6
24

6

votes

1 answer

Is it good practice to perform model parameter tuning on a random subsampling of a large dataset?

A lot of the datasets presented to us in the company at which I'm currently an intern are very large (many millions of rows / Gigabytes, or even Terabytes of data). While running machine learning experiments, I find myself wanting to use (cross…

modeling optimization algorithms hyperparameter subsampling

asked Oct 18 '16 at 16:28

TBZ92

163
3

5

votes

1 answer

Intuition behind m-out-of-n bootstrap

I am trying to get some intuition on why m-out-of-n bootstrap works but haven't been able to find good explanation. I would really appreciate any input on this. I think I do understand what bootstrap is about -- estimating how…

bootstrap intuition subsampling

asked Jul 07 '20 at 20:20

RevealedPreference

90
5

5

votes

1 answer

What is a good introductory text on resampling methods?

I have found a few decent ones about specific resampling applications such as bootstrapped confidence intervals, but nothing broader. A journal article or book chapter would be preferable to an entire book, but all recommendations are welcome.…

references bootstrap resampling subsampling jackknife

asked Mar 29 '17 at 22:55

Jeffrey Girard

3,922
1
13
36

5

votes

1 answer

What is the effect of using survey sample weights for a sub-sample?

If a sub-sample of the survey sample, selected based on certain demographic characteristics of the data (e.g. age, race etc.), is used, which means the sub-sample might not be representative of the population anymore, is it better to not use…

survey sample weighted-data subsampling survey-weights

asked Jun 11 '16 at 05:39

tvl

61
6

3

votes

0 answers

Finding a sub-population from dataset matching another target dataset

Let's say one has a finite collection of i.i.d. samples from an unknown source distribution $S=\{x_{i} | i \in [1,n_{S}], x_{i} \sim p_{X_{S}}(x)\}$. Where each $x$ is multidimensional and has continuous and discrete components. One also has…

distance iid subsampling

asked Oct 29 '20 at 18:27

jeandut

136
5
15

3

votes

0 answers

Is there a resampling method that blends subsampling with the bootstrap?

I apologize if this is an inappropriate question. I thought of it in class the other day, and I couldn't find a specific answer in my textbooks. I am familiar with the two basic techniques for resampling data: Subsampling - drawing m observations…

bootstrap resampling methodology subsampling

asked Jun 13 '15 at 12:18

Cat C.

31
3

2

votes

1 answer

Subsample analysis based on country-level indices?

In a generalized Difference-in-Difference setting from Dasgupta,2019 for multiple event dates (laws staggered implementation) The baseline equation: $Y_{it}$ = $\alpha$ + $\beta$ $(Leniency Law)_{kt}$ + $\delta$$X_{ikt}$ + $\theta$$_t$ +…

econometrics difference-in-difference subsampling

asked May 30 '21 at 21:59

Louise

97
1
16

2

votes

0 answers

Subsampling the "right" amout of data to train an ML model

I am training a machine learning model (i.e., a classifier) on a large dataset. I know that I can get the same results using less data (about 30%) but I would like to avoid the trial and error process to find the 'right' amount of data to retain…

sampling supervised-learning subsampling

asked Apr 15 '21 at 07:18

giz

21
2

2

votes

1 answer

Do both Bootstrap with and without replacement create a distribution?

I'm having a "noisy debate" with colleagues about whether sampling without replacement can still create a distribution. Methodology: A bootstrap (iterative process where I calculate Somers' D for new samples) is done with and without replacement. I…

distributions normal-distribution sampling bootstrap subsampling

asked Jul 04 '19 at 08:40

user235111

35
3

2

votes

1 answer

How to: Normal sub-sampling out of a uniformly distributed data samples

Given a uniformly distributed sample of data, It's needed to sub-sample out the points in a Normal distribution fashion, i.e. more around mean and sparser as we move out. What could be the steps?

normal-distribution uniform-distribution subsampling

asked Oct 16 '17 at 10:58

Saransh

23
4

2

votes

1 answer

Subsampling to determine a standard error, how does it work?

I need to calculate the standard error on a complicated dataset (> 1700 records) which uses genetic matching. Using bootstrap results in very high computation time (because of the genetic matching). My professor gives subsampling as an…

r standard-error resampling subsampling

asked May 03 '16 at 21:15

dietervdf

1,132
1
9
20

2

votes

0 answers

Formal method to find optimal sub-sample size from large sample for multiple regression

I have labour market data for 9 million observations, for a single time period (i.e cross-section data). I am studying the determinant of wages in a single equation multiple regression with around 300 regressors. If $Y$ is the vector of wages (9…

multiple-regression sample-size subsampling

asked Jan 11 '16 at 17:19

luchonacho

2,568
3
21
38

Questions tagged [subsampling]