What normality assumptions are required for an unpaired t-test? And when are they met?

Question

If we wish to conduct a paired t-test, the requirement is (if I understand correctly) that the mean difference between the matched units of measurement will be distributed normally.

In paired t-test, that is articulated (AFAIK) in the demand that the difference between the matched units of measurement will be distributed normally (even if the distribution of each of the two compared groups are not normal).

However, in an unpaired t-test, we can not talk about the difference between matched units, so we require the observations from the two groups to be normal so that the difference of their mean will be normal. Which leads me to my question:

Is it possible for two non-normal distributions so that the difference of their means ARE distributed normally? (and thus, satisfy our needed requirement for performing an unpaired t-test on them - again - as far as I understand).

Update: (thank you all for the answers) I see that the general rule we are looking for is indeed that the difference of the means will be normal, which seems to be a good assumption (under large enough n) due to the CLT. This is amazing to me (not surprising, just amazing), as for how this works for the unpaired t-test, but won't work as well for the single sample t-test. Here is some R code to illustrate:

n1 <- 10
n2 <- 10
mean1 <- 50
mean2 <- 50
R <- 10000

# diffs <- replicate(R, mean(rexp(n1, 1/mean1)) - mean(runif(n2, 0, 2*mean2)))
# hist(diffs)

P <- numeric(R)
MEAN <- numeric(R)
for(i in seq_len(R))
{
    y1 <- rexp(n1, 1/mean1)
    y2 <- runif(n2, 0, 2*mean2)
    MEAN[i] <- mean(y1) - mean(y2)
    P[i] <- t.test(y1,y2)$p.value
}
# diffs <- replicate(R, mean(rexp(n1, 1/mean1)) - mean(runif(n2, 0, 2*mean2)))
par(mfrow = c(1,2))
hist(P)
qqplot(P, runif(R)); abline(0,1)
sum(P<.05) / R # for n1=n2=10 -> 0.0715 # wrong type I error, but only for small n1 and n2 (for larger ones, this effect disappears)



n1 <- 100
mean1 <- 50
R <- 10000
P_y1 <- numeric(R)

for(i in seq_len(R))
{
    y1 <- rexp(n1, 1/mean1)
    P_y1[i] <- t.test(y1 , mu = mean1)$p.value
}

par(mfrow = c(1,2))
hist(P_y1)
qqplot(P_y1, runif(R)); abline(0,1)
sum(P_y1<.05) / R # for n1=n2=10 -> 0.057  # "wrong" type I error

Thanks.

**Sure**. Let $(X_i,Y_i)$ be your iid bivariate sample. Let $X_i$ have an *arbitrary* distribution $F$ and take $Y_i = X_i + Z_i$ where $\{Z_i\}$ are iid $\mathcal{N}(0,\sigma^2)$. — cardinal, Dec 11 '11 at 19:31

score 18 · Accepted Answer · answered Dec 11 '11 at 19:38

In practice, the Central Limit Theorem assures us that, under a wide range of assumptions, the distributions of the two sample means being tested will themselves approach Normal distributions as the sample sizes get large, regardless (this is where the assumptions come in) of the distributions of the underlying data. As a consequence, as the sample size gets larger, the difference of the means becomes normally distributed, and the requirements necessary for the t-statistic of an unpaired t-test to have the nominal t distribution become satisfied. Thus, a more practically applicable question might be, how large does the sample size have to be before I can safely ignore the difference between the actual distribution of the statistic and the t distribution?

In many cases, the answer is "not very large", especially when the underlying distributions are pretty close to symmetric. For example, I simulated 100,000 tests comparing the means of two Uniform(0,1) distributions, each with sample size 10, and, when testing at the 95% level of confidence, actually rejected the null 5.19% of the time - hardly different from the nominal 5% rejection rate we're hoping for (although it is about 2.7 standard deviations above 5%.)

This is why people use the t-test in all sorts of situations where the underlying assumptions are not actually met, but of course your mileage may vary, depending upon the specifics of your problem. However, there are other tests that don't require Normality, such as the Wilcoxon test, which, even when the data is Normally distributed, is, asymptotically, about 95% as efficient as the t-test (i.e., requires a sample size of N/0.95 to have the same power as a t-test with a sample size of N, as N goes to infinity). When the data isn't Normally distributed, it can be (not necessarily will be) a lot better than the t-test.

In my experience the required sample size for the $t$ distribution to be accurate is often larger than the sample size at hand. The Wilcoxon signed-rank test is extremely efficient as you said, and it is robust, so I almost always prefer it over the $t$ test. — Frank Harrell, Dec 11 '11 at 20:21
Thanks Frank - your comment helped me articulate a question which is closer to what I am after: http://stats.stackexchange.com/questions/19681/when-to-use-the-wilcoxon-rank-sum-test-instead-of-the-unpaired-t-test — Tal Galili, Dec 11 '11 at 21:14

score 2 · Answer 2 · answered Dec 11 '11 at 19:28

Of course. If this wasn't the case then the independent samples t-test wouldn't be of much use. We really need larger sample sizes though because for us to test for a difference in means between two non-normal populations we need to appeal to the CLT.

For a quick example let's assume we have population 1 coming from an exponential with mean 25 and population 2 being uniformly distributed with mean 30. We'll even give them different sample sizes. We can examine what the distribution of the differences in sample means looks like using R relatively easily using the replicate function.

n1 <- 30
n2 <- 25
mean1 <- 25
mean2 <- 30

diffs <- replicate(10000, mean(rexp(n1, 1/mean1)) - mean(runif(n2, 0, 2*mean2)))
hist(diffs)

Playing around with the sample sizes will show that at low sample sizes we don't really have normality but increasing the sample size gives us a more normal looking sampling distribution for the difference in means. Of course you could change the distributions used in this example to explore further. hist(diffs)

What normality assumptions are required for an unpaired t-test? And when are they met?

2 Answers2

Linked