If we wish to conduct a paired t-test, the requirement is (if I understand correctly) that the mean difference between the matched units of measurement will be distributed normally.
In paired t-test, that is articulated (AFAIK) in the demand that the difference between the matched units of measurement will be distributed normally (even if the distribution of each of the two compared groups are not normal).
However, in an unpaired t-test, we can not talk about the difference between matched units, so we require the observations from the two groups to be normal so that the difference of their mean will be normal. Which leads me to my question:
Is it possible for two non-normal distributions so that the difference of their means ARE distributed normally? (and thus, satisfy our needed requirement for performing an unpaired t-test on them - again - as far as I understand).
Update: (thank you all for the answers) I see that the general rule we are looking for is indeed that the difference of the means will be normal, which seems to be a good assumption (under large enough n) due to the CLT. This is amazing to me (not surprising, just amazing), as for how this works for the unpaired t-test, but won't work as well for the single sample t-test. Here is some R code to illustrate:
n1 <- 10
n2 <- 10
mean1 <- 50
mean2 <- 50
R <- 10000
# diffs <- replicate(R, mean(rexp(n1, 1/mean1)) - mean(runif(n2, 0, 2*mean2)))
# hist(diffs)
P <- numeric(R)
MEAN <- numeric(R)
for(i in seq_len(R))
{
y1 <- rexp(n1, 1/mean1)
y2 <- runif(n2, 0, 2*mean2)
MEAN[i] <- mean(y1) - mean(y2)
P[i] <- t.test(y1,y2)$p.value
}
# diffs <- replicate(R, mean(rexp(n1, 1/mean1)) - mean(runif(n2, 0, 2*mean2)))
par(mfrow = c(1,2))
hist(P)
qqplot(P, runif(R)); abline(0,1)
sum(P<.05) / R # for n1=n2=10 -> 0.0715 # wrong type I error, but only for small n1 and n2 (for larger ones, this effect disappears)
n1 <- 100
mean1 <- 50
R <- 10000
P_y1 <- numeric(R)
for(i in seq_len(R))
{
y1 <- rexp(n1, 1/mean1)
P_y1[i] <- t.test(y1 , mu = mean1)$p.value
}
par(mfrow = c(1,2))
hist(P_y1)
qqplot(P_y1, runif(R)); abline(0,1)
sum(P_y1<.05) / R # for n1=n2=10 -> 0.057 # "wrong" type I error
Thanks.