Welch's t-test false positive error rate challenge

Question

It is well known that Welch's t-test is robust to violations of the normality assumption and is arguably underused by researchers.1 It is, of course, the default t-test in R. In terms of containing the false positive error rate, just how robust is Welch's test. I'm interested in really punishing the test and seeing how much abuse it can actually take. In running a few simulations, the results are quite remarkable. Statistical tests on sample sizes of n = 3 are routine in published biological research. Check almost any issue of Science. I know this is anathema to statisticians, but it is nevertheless common. So let's take sample sizes of n = 3 from any two distributions we like, set them to have the same mean and simulate the Welch test p values.

Similar simulations were performed here: "false positive error rate from skewed distributions"
In changing the sample sizes to n = 3, the highest p value I can obtain is about .13.

Similar results are obtained from other skewed distributions such as beta distributions, but the Chi-squared distributions in opposite directions are the most punishing that I can find.

Forget about power. Forget good statistical practice (for now).

What's the highest simulated false positive error rate that anyone can produce from sample sizes of n = 3 from any distributions using Welch's test. Bonus marks for anyone that can provide a proof (not a simulation) of the theoretical upper limit of this.

To clarify: we construct two sets of data with n=3 and the same first moment but divergent higher moments, and then show that the t-test of means properly fails to reject the null hypothesis of equivalent means? But the means are constrainted to be identical . . . — Peter Leopold, Feb 06 '19 at 03:48
Hi Peter. Yes that's right. The null really is true: there is no difference between means. But do your worst with the variances and higher moments. How high can you push the false positive error rate? — Rob Casson, Feb 07 '19 at 06:58
I've only just seen this post, sadly. My first attempt seems to have a rejection rate of about 99.88% (on $10^5$ simulations). I imagine I could push that higher by choosing a more extreme example. My example has all moments finite by the way. I'll play with it a bit. — Glen_b, Feb 09 '19 at 11:26
Unfortunately more extreme examples seem to fall foul of R's checking for nearly-constant data. I'll think about pursuing what the checks are and seeing how to avoid them while maintaining the underlying characteristics of the populations. — Glen_b, Feb 09 '19 at 11:36
Very interesting Glen. I should have mentioned, of course, that I'm setting the significance at 0.05. I assume you are too. So we are creating a vector of p values from Welch' test simulating equal means with n = 3 in each sample from really ugly distributions, then calculating sum(pval < 0.05)/nsim. Are you really obtaining such massive false positive errors under these conditions? — Rob Casson, Feb 10 '19 at 21:05

Welch's t-test false positive error rate challenge

0 Answers0