7

Permutation tests assume exchangeability of the response/observations under the null hypothesis.

In what practical situations is this clearly violated? When is it unproblematic?

Edit/additional question in order to not be considered as duplicate: if we permute within additional blocking structure (e.g. patients) and sum up the test statistic across blocks, we would only need exchangeability within block, right?

PS: I am not looking for tests to verify this symmetry condition as it is a condition under the null, not of the observed data...

Michael M
  • 10,553
  • 5
  • 27
  • 43
  • 3
    Another possible name could be *permutability*: Can you permute the sample without destroying structure? Excludes time series ... – kjetil b halvorsen Jun 19 '20 at 21:04
  • 1
    Other possible dups: https://stats.stackexchange.com/questions/188842/intuition-behind-exchangeability-property-and-its-use-in-statistical-inference, https://stats.stackexchange.com/questions/3520/can-someone-explain-the-concept-of-exchangeability, https://stats.stackexchange.com/questions/340903/understanding-exchangeability – kjetil b halvorsen Jun 19 '20 at 21:07
  • While some answers are excellent (including yours), I don't see how they would answer my question about permutation tests and practical situations. I think your first comment goes in a very good direction. – Michael M Jun 19 '20 at 21:25

3 Answers3

8

One situation in which exchangeability does not hold occurs when we're testing whether means of two groups are equal, but suspect variances may be unequal.

To be specific, let's look at the following situation: x1 is a sample of size $n_1 = 10$ from a normal population with $\mu_1=100$ and $\sigma_2=10$ and x2 is a sample of size $n_2 = 50$ from a normal population with $\mu_2=100$ and $\sigma_2=4.$

Inappropriate pooled t test. Suppose we try to use a pooled 2-sample t test of $H_0:\mu_1=\mu_2$ vs $H_a:\mu_1\ne\mu_2.$ Then the true rejection rate (about $36\%)$ of an alleged test at level $\alpha=0.05=5\%$ is much larger than $5\%,$ as shown by the following simulation in R. A monumental 'false discovery' rate. The pooled test assumes the two samples are from populations with equal variances.

set.seed(2020)
pv = replicate(10^5, t.test(rnorm(10,100,20),
                 rnorm(50,100,4), var.eq=T)$p.val)
mean(pv <= .05)
[1] 0.35981

Welch t test, not assuming equal variances. Such situations with unequal variances validate the preference of many statisticians for the Welch two-sample t test, which does not assume equal variances in the two populations. The Welch test (with intended $\alpha=5\%)$ has a true significance level very nearly $5\%.$

set.seed(2020)
pv = replicate(10^5, t.test(rnorm(10,100,20),
                 rnorm(50,100,4))$p.val)
mean(pv <= .05)
[1] 0.05056

Flawed permutation test with non-exchangeable samples. A permutation test using the difference in sample means as metric is no 'cure' for lack of exchangeability caused by heteroscedasticity.

set.seed(620)
m = 10^5;  pv = numeric(m)    
for(i in 1:m) {    
 x1 = rnorm(10, 100, 20);  x2 = rnorm(50, 100, 5)
 x = c(x1, x2)
 d.obs = mean(x[1:10]) - mean(x[11:60])

 for(j in 1:2000) {
  x.prm = sample(x)
  d.prm[j] = mean(x.prm[1:10]-x.prm[11:60]) }
  pv[i] = mean(abs(d.prm) >= abs(d.obs))
 }
 mean(pv <= .05)
 [1] 0.3634

So the rejection rate of the permutation test, with difference in means as its metric and an intended $\alpha = 0.05,$ is about as high as for the pooled t test.

Note: A permutation test with the Welch t statistic as metric treats samples with unequal variances as exchangeable (even if data may not be normal). Its significance level would be substantially correct.

BruceET
  • 47,896
  • 2
  • 28
  • 76
  • Very helpful, thanks a lot. Regarding the unequal variance situation: this is only a problem if variances are assumed to be unequal under the null, right? So e.g. if we would compare two Poisson means, this is not an issue? – Michael M Jun 20 '20 at 08:24
  • 1
    If you know variances equal or strongly suspect they are from past experience with similar data, then OK to use pooled t. If in doubt about equal variances, then use Welch.// In Poisson case if $H_0: \lambda_1 = \lambda_2,$ then under $H_0$ variances must also be equal. Distribution for a test is always determined by $H_0.$ (But of course, _power_ computations consider dist'n under a particular alternative.) – BruceET Jun 20 '20 at 18:02
6

Another important case is tests for interaction. The null hypothesis of additivity does not imply exchangeability. In a linear, constant variance model you can permute residuals (Anderson, 2001), in generalised linear models it's more complicated

Thomas Lumley
  • 21,784
  • 1
  • 22
  • 73
6

There are many, many situations where exchangeability of values in a sequence does not hold. One general scenario is when you have a time-series of values that are autocorrelated, so that values near each other in time are statistically related. For example, if we produce a random walk, the values in the random walk are not exchangeable, and this will be extremely obvious by comparing a plot of the random walk to a plot of a random permutation of that random walk.

#Generate and plot a one-dimensional random walk
set.seed(1);
n <- 10000;
MOVES <- sample(c(-1, 1), size = n, replace = TRUE);
WALK  <- cumsum(MOVES);
plot(WALK, type = 'p',
     main = 'Plot of a Random Walk',
     xlab = 'Time', ylab = 'Value');

#Plot a random permutation of the random walk
PERM <- sample(WALK, size = n, replace = FALSE);
plot(PERM, type = 'p',
     main = 'Plot of a Randomly Permuted Random Walk',
     xlab = 'Time', ylab = 'Value');

enter image description here

enter image description here

We can see from these plots that the random permutation jumbles the order of the points so that values near each other in time are no longer near to each other in value. Any moderately sensible runs test will easily detect that the first plot involves a vector of values that is not exchangeable.

Ben
  • 91,027
  • 3
  • 150
  • 376