Bootstrapping and comparing mean distributions

Question

Is the following a reasonable approach to assess the statistical significance of the difference between two groups'

For each group 1) Subsample with replacement 2) Take the mean of the subsample 3) Repeat 10,000 times to build a distribution of means 4) Carry out a t-test to assess the difference between those two distributions

(i.e. bootstrapping to build a distribution of means)

The two datasets are very different in size (~100 vs. 100,000). The alternative approach would be to subsample from each to build two equally sized datasets, and then use a t-test on those two samples. The problem I have with this is I'm not sure if the smaller of the two sets is normally distributed (while the larger is), which may invalidate the t-test assumptions?

Because the results of your t-test will depend on how often you resample, it tells you nothing about your data. If you want to compare the means of the two groups, why aren't you simply applying a t-test to *them*? — whuber, Jul 07 '15 at 13:40
Because I'm worried the the smaller of the two groups may not be normally distributed. Obviously I can assess this (e.g. using a qqplot). Assuming the data **are** normally distributed: is it ok to then bootstrap to create two equally sized datasets and then run a t-test on that? If they are **not** normally distributed would it be appropriate to bootstrap to generated two equally sized datasets and then compare using some kind of non-parametric test (e.g. Kolmogorov–Smirnov)? — DoctorOctagon, Jul 07 '15 at 13:54
It sounds like you might have an unusual conception of what the bootstrap is. Consider reviewing your understanding by searching our site or consulting a textbook. In the meantime, the t-test could care less about the distribution of the smaller of the groups. All that matters is the sampling distribution of its sample mean. Unless that group has a high skew, it is quite possible the distribution of its sample mean is decently approximated by a normal distribution and a t-test would then give reliable results. *That* is something you could ascertain with a bootstrap procedure if you like. — whuber, Jul 07 '15 at 14:34
@whuber While I agree with your advice, I don't agree with the statement "*All that matters is the sampling distribution of its sample mean*". To have a t-distribution, what matters is the sampling distribution of the t-statistic which is distinct from the sampling distribution of the sample mean; it has a numerator and a denominator (and their dependence also comes into it). However, it may be that the smaller sample is large enough that the denominator may be treated as nearly-constant (and in that case, the distribution of the smaller-sample mean would be the main issue). ... (ctd) — Glen_b, Jul 08 '15 at 02:44
(ctd) ... while I don't think you need my advice (I won't be telling you anything you don't already know about better than I do), I wanted to mention the distinction for other readers. [If one simply bootstraps the difference in sample mean - which would make perfect sense - then the discussion of the distribution of the t-statistic becomes moot.] — Glen_b, Jul 08 '15 at 02:49

Lauren Goodwin · Answer 1 · 2015-07-08T18:17:23.240

1

This is not how you would do a simulation test (not a bootstrap test here). What you want to do mix all the data together and then randomly redivide into two new groups find the mean of each group take the difference and plot it. Repeat lots of times, 10,000 for instance. Then you can find a p-value but counting all the results as or more extreme than your observed result (the original difference in means) and divide by 10000. This is a non-parametric version of a t-test called a permutation test. However, you could use a t-test for difference in means but there are more assumptions about the data than this test.

edited Jul 08 '15 at 18:17

answered Jul 07 '15 at 14:49

Lauren Goodwin

561
3
10

1

You're describing a permutation test, or more strictly speaking, a randomization test (which is fine; that would be my first thought), though the p-value isn't found by adding up the more extreme values ($d_1+d_2+...$), but by *counting* them before dividing by the number of resamples (I presume that's what you meant, but it may not be clear to others). It's also possible to actually do a bootstrap test, though as you say, that's not what the OP described. – Glen_b Jul 08 '15 at 02:52

Bootstrapping and comparing mean distributions

1 Answers1

Linked