What is the theoretical justification of testing for equal mean/distribution via resampling?

Question

If we have two data sets $X_1,\ldots,X_m$ and $Y_1,\ldots,Y_n$, each i.i.d., and wanted to determine whether $\mathbb{E}[X_1] = \mathbb{E}[Y_1]$ or not using $\bar{X}_m - \bar{Y}_n = \hat{\Delta}_{m,n}$, a resampling procedure would treat the observed numbers as if they came from the same data set and randomly assign the collection of random numbers observed to the $X$ group and the $Y$ group and compute many $\hat{\Delta}_{m,n}$. The observed difference would then be compared to the Monte Carlo sampled differences to compute a $p$-value.

If one were to go even further and say they want to decide if the two data sets have the same distribution or not, the same procedure could be applied but using the Kolmogorov-Smirnov statistic (or any other metric comparing distributions).

How would I theoretically justify these procedures? The questions that clearly would need to be asked and answered would be: (1) Does the test have appropriate behavior under the null hypothesis? (2) Is the test consistent under the alternative hypothesis? A nice question to answer would be (3) How would the power of the test be characterized? If I wanted to get into mathematical details about why these procedures worked, what would I see?

score 1 · Answer 1 · answered Oct 18 '21 at 18:04

In your first part of the question you are referring to permutation tests: https://en.wikipedia.org/wiki/Permutation_test Those tests are exact tests (https://en.wikipedia.org/wiki/Exact_test), i.e. the type 1 error of rejecting the null hypotesis (that the two data sets have the same mean) can be controlled by setting the significance niveau.

I think a similar question is Hypothesis Testing: Permutation Testing Justification which concludes that permutation tests make the additional assumption that you are allowed to swap data from one data set to the other dataset

In the other part of your question you are suggesting to use bootstrap + KS test which seems to have been asked here Bootstrapping and Kolmogorov-Smirnov

score -1 · Answer 2 · answered Oct 18 '21 at 18:33

I interpret your example as follows: (a) There are two datasets from independent sources, let's say Agencies A and B, separately collect survey data on household income. (b) Each dataset collected a random sample of a target population, of sizes (m) and (n), respectively. We expect $\bar{X}_n$ and $\bar{Y}_m$ to be independent which means, that $\mathbb{V}(\bar{X}_m - \bar{Y}_n) = \mathbb{V}(\bar{X}_m) + \mathbb{V}(\bar{Y}_n)$ and we don't need to deal with the covariance.

In this case, I recommend that you run a stratified bootstrap by dataset, where you resample the observations in datasets A and B independently. If you run a non-stratified bootstrap, you're treating the observations as though they come from the same dataset.

There is often more than one way to do bootstrap, either by stratifying or clustering. When in doubt, it is useful to think about which one best "mimics" the way that the data is collected.

I was wondering about the downvote. I would guess it is intended to suggest you will have a better chance of describing a correct, useful procedure if you were first to state the null and alternative hypotheses implied by the question. — whuber, Oct 18 '21 at 19:44
This is not a real answer to the question. It attempts to suggest a procedure when the question is not asking for a procedure recommendation. It's asking what the theoretical justification for an existing procedure procedure is. — cgmil, Oct 18 '21 at 22:02

What is the theoretical justification of testing for equal mean/distribution via resampling?

2 Answers2