Robustness of the Student t-test to non-Gaussian data

Question

Let $X_i$ be a collection of $n$ independent identically distributed Gaussian random variables with shared mean and variance $(\mu,\sigma^2)$.

Let $\bar X$ and $\tilde \sigma$ be their empirical mean and standard deviation:

$$ \bar X = \frac{1}{n} \sum_{i=1}^{n} X_i $$ $$ (\tilde \sigma)^2 = \frac{1}{n-1} \sum_{i=1}^{n} (X_i - \bar X)^2 $$

From these quantities, we can construct:

$$ Z = \frac{\sqrt n \bar X}{\tilde \sigma} $$

which is a Student random variable if $\mu = 0$, justifying the use of this statistic to test the hypothesis $\mu = 0$

My question relates to whether this test is "robust" to the $X_i$ being non-Gaussian: is the t-test a good idea to use regardless or not?

Indeed, now consider $n$ IID $Y_i$ with non-Gaussian distribution with mean and variance $(\mu,\sigma^2)$. Is there a convergence result of some kind such that:

$$ Z_{n,Y} = \frac{\sqrt n \bar Y_n}{\tilde \sigma_{n,Y}} $$

converges to a Student distribution when $\mu=0$?

There is a conventional response to this question, which I find deeply unsatisfying. It goes thus:

$\tilde \sigma_{n,Y}$ will converge to the constant $\sigma$ because of the law of large-numbers
$\bar Y_n$ will converge to a Gaussian because of the central limit theorem
Thus $Z$ will have a Gaussian distribution as $n \rightarrow \infty$. Since the Student distribution with $n$ degrees of freedom also tends to a Gaussian, everything is fine

This argument is unsatisfying to me because it does nothing to justify the Student distribution instead of some other alternative distribution which converges to a Gaussian like the Student distribution does. I'm looking for a result that goes beyond this simpler asymptotic.

The second question about asymptotics is answered here, http://stats.stackexchange.com/a/44280/28746 — Alecos Papadopoulos, Oct 24 '16 at 09:31
It's not sufficient that the numerator be normal and the denominator an appropriately scaled chi-distribution. You also need that they're independent to get that the ratio is $t$-distributed. Any time you don't have normality, you also won't have independence. — Glen_b, Oct 24 '16 at 11:48
@AlecosPapadopoulos : indeed, one can justify the use of the student distribution through normal limit behavior and the CLT, but this seems incredibly silly. Why don't we start using the Gamma distribution then to build confidence intervals ? It also asymptotes to the normal after all ! I'd like to get beyond this poor-man's asymptotic result. For Glen_b: indeed, we would also need independence. I'll rewrite a clearer question "soon" — Guillaume Dehaene, Oct 24 '16 at 14:56
Please clarify also the following: your question seems to ask about what happens asymptotically -but in your comment it appears you are wondering whether, in practice we can use not the normal but the student-t itself, as an approximation. — Alecos Papadopoulos, Oct 24 '16 at 19:10
The standard error of $\bar{x}$ is $\sigma/\sqrt{n}$ not $\sigma \sqrt{n}$... — Glen_b, Oct 25 '16 at 09:55
@Glen_b I believe one could nevertheless conclude that the sequence converges to a standard Normal distribution. Exploit the fact that the dependence between the mean and SD grows weaker as $n$ increases, and apply a generalization of the CLT. — whuber, Oct 26 '16 at 14:44
@whuber That much is clearly the case (e.g. dividing numerator and denominator of the t-statistic $\frac{\bar{X}-\mu_0}{s/\sqrt{n}}$ by $\sigma/\sqrt{n}$, if the CLT applies to the random variable to the numerator and $s$ is a consistent estimator of $\sigma$ being used in the denominator, we could use Slutsky's theorem to argue that the distribution of the ratio converges to the distribution of the term in the numerator). But none of that gives us a t-distribution at finite $n$, just two different things each converging in the limit to the same distribution. — Glen_b, Oct 26 '16 at 20:26
@Glen_b That's what puzzles me: this question is not about any finite $n$. Since all we're told about the distribution is that it's "non-Gaussian" and has finite variance, it seems there's little to say about finite $n$ in general. Maybe somebody could volunteer a set of simple conditions that would assure "near-t" distributions of the statistic for "small" $n$. — whuber, Oct 26 '16 at 22:07
whuber: I'm indeed looking for a "small n asymptotic result" which goes beyond the traditional argument that the empirical variance converges almost surely to the true variance. Such a result would provide the simple conditions which you are talking about. I'm guessing that the fourth or fifth moment being finite might be sufficient. If not, having a moment generating function is probably more than sufficient. — Guillaume Dehaene, Oct 27 '16 at 09:58
For such a fundamental question, this is getting not a lot of attention. I was trying to think about how one would go about proving this convergence in a satisfying manner. The right way seems to be to prove total-variation convergence of the pair $\bar Y, \tilde \sigma_Y$ to the pair $\bar X, \tilde \sigma_X$. This proves the convergence of $Z_Y$ to $Z_X$ in total-variation. I might tackle this at some point in the future. — Guillaume Dehaene, Nov 01 '16 at 08:20

jochen · Accepted Answer · 2016-11-02T17:55:09.657

Here is an attempt at answering the question using numerical experiments: using Monte Carlo estimation it is easy to determine the rate of type I errors for the test with a given distribution of input data. Here I try data from the following distributions:

Normally distributed data: here the t-test is guaranteed to work.
Samples from the uniform distribution on $[-1,1]$: this is a prototype for a distribution with light tail (or rather, the extreme case of no tails).
The double-exponential distribution: this is a distribution with heavier tails than the normal distribution has.
A shifted exponential distribution, $\mathrm{Exp}(1) - 1$: this is a very asymmetric distribution, with a tail only on one side.
The discrete uniform distribution on the set $\{-1,+1\}$: this could be seen as an extreme case of a bi-modal distribution.
The discrete distribution with $P(X=-1) = 0.9$ and $P(X=9)=0.1$: this is very far from a normal distribution because it is both discrete and very asymmetric.

Since we expect the test to get more accurate as $n$ increases, I try only small and moderate values of $n$, namely $n \in \{10, 30, 100\}$. For the significance level I choose the commonly used value $\alpha = 5\%$.

My experiment is performed using the following R script: the script simulates $N=1,000,000$ dataset of size $n$, applies the t-test and counts how often $H_0\colon \mu=0$ is (wrongly) rejected. If the t-test still works, this should be the case in $5\%$ of the cases, any deviation from $5\%$ indicates that for the given distribution and $n$ the t-test did not perform optimally.

Edit: As requested by the OP, I have changed the code to also perform the same experiments for the z-test, so that the performance of both tests can be compared.

set.seed(1)

try.one <- function(gen, n, N=1000000, alpha=0.05) {
    crit.t <- qt(1 - alpha/2, n-1)
    reject.t <- 0
    crit.z <- qnorm(1 - alpha/2)
    reject.z <- 0
    for (j in 1:N) {
        X <- gen(n)
        Z <- sqrt(n) * mean(X) / sd(X)
        if (abs(Z) > crit.t) {
            reject.t <- reject.t + 1
        }
        if (abs(Z) > crit.z) {
            reject.z <- reject.z + 1
        }
    }
    p.t <- reject.t/N
    p.z <- reject.z/N
    list(prob.t=p.t, sd.t=sqrt(p.t*(1-p.t)/N), prob.z=p.z, sd.z=sqrt(p.z*(1-p.z)/N))
}

distributions <- c("normal", "uniform", "double exponential", "exponential",
    "discrete", "asym. discrete")

try.all <- function() {
    dist.name <- character(0)
    nn <- numeric(0)
    fp.rate.t <- numeric(0)
    std.err.t <- numeric(0)
    fp.rate.z <- numeric(0)
    std.err.z <- numeric(0)
    for (dist in distributions) {
        if (dist == "normal") {
            gen <- rnorm
        } else if (dist == "uniform") {
            gen <- function(n) runif(n, -1, 1)
        } else if (dist == "double exponential") {
            gen <- function(n) rexp(n) * sample(c(-1,1), n, replace=TRUE)
        } else if (dist == "exponential") {
            gen <- function(n) rexp(n) - 1
        } else if (dist == "discrete") {
            gen <- function(n) sample(c(-1,1), n, replace=TRUE)
        } else if (dist == "asym. discrete") {
            gen <- function(n) sample(c(-1, 9), n, replace=TRUE, prob=c(0.9,0.1))
        }
        for (n in c(10, 30, 100)) {
            row <- try.one(gen, n)

            dist.name <- c(dist.name, dist)
            nn <- c(nn, n)
            fp.rate.t <- c(fp.rate.t, row$prob.t)
   std.err.t <- c(std.err.t, row$sd.t)
            fp.rate.z <- c(fp.rate.z, row$prob.z)
   std.err.z <- c(std.err.z, row$sd.z)
        }
    }
    data.frame(dist.name, n=nn, fp.rate.t, std.err.t, fp.rate.z, std.err.z)
}

print(try.all(), row.names=FALSE)

The output, after some minutes, is

          dist.name   n fp.rate.t    std.err.t fp.rate.z    std.err.z
             normal  10  0.050029 0.0002180048  0.081694 0.0002738980
             normal  30  0.050059 0.0002180667  0.059824 0.0002371605
             normal 100  0.049930 0.0002178004  0.052726 0.0002234859
            uniform  10  0.054490 0.0002269820  0.084445 0.0002780540
            uniform  30  0.050906 0.0002198058  0.060263 0.0002379735
            uniform 100  0.050116 0.0002181843  0.053001 0.0002240355
 double exponential  10  0.042272 0.0002012090  0.074645 0.0002628177
 double exponential  30  0.047506 0.0002127185  0.057410 0.0002326244
 double exponential 100  0.049646 0.0002172125  0.052526 0.0002230852
        exponential  10  0.099738 0.0002996503  0.130045 0.0003363529
        exponential  30  0.072758 0.0002597389  0.082090 0.0002745018
        exponential 100  0.058040 0.0002338191  0.060755 0.0002388804
           discrete  10  0.021386 0.0001446673  0.109666 0.0003124730
           discrete  30  0.042853 0.0002025256  0.042853 0.0002025256
           discrete 100  0.056972 0.0002317891  0.056972 0.0002317891
     asym. discrete  10  0.350463 0.0004771150  0.350463 0.0004771150
     asym. discrete  30  0.044408 0.0002059998  0.191153 0.0003932093
     asym. discrete 100  0.067916 0.0002516017  0.067916 0.0002516017

Some observations about these results:

The rate of type I errors is listed in the column fp.rate. As expected, for the normal distribution this is very close to $5\%$.
In nearly all cases, the rate of type I errors gets closer to $5\%$ as $n$ increases, sometimes from below and sometimes from above. The only exception is the asymmetric discrete distribution.
The weight of the tails seems not to have too much effect: the test performs reasonably well for both uniform and double exponential distributions.
For small sample size ($n=10$) there are notable deviation of the type I error rate from $5\%$, both for the discrete distributions and for the asymmetric distributions.
The worst case is the discrete, asymmetric distribution where the t-test at $5\%$-level shows type I errors in $35\%$ of the cases. Given this huge discrepancy, I would argue that care is required when attempting to use the $t$-test for distributions which are far from normal.

Edit: Using the updated code, we can also compare the performance of the t-test to the performance of a z-test (still using the sample variance):

As expected, for normally distributed data the z-test performs worse that the t-test (because we didn't use the exact variance). The effect is quite noticeable for $n=10$ and nearly disappears for $n=100$. For $n=10$, the t-test seems superior to the $z$-test (using estimated variances) for all examples tested.
The worst case (assymetric+discrete, $n=10$) is equally bad for both tests.
For $n=100$ the results of both tests are very similar, but in some cases the t-test seems to perform slightly better.

This experiment only considers the type I error, but experiments along similar lines could be used to compare type II errors between distributions.

Nice job. Could you please also compare the false positive rate for a Gaussian test instead of a student test? The question also deals with the question of whether the t-test is superior to the, asymptotically valid, normal approximation. I'm curious to see whether the values for $n=100$ are much higher or not. Thank you very much for these simulations anyway. — Guillaume Dehaene, Nov 02 '16 at 12:25
I have now added a comparison to the z-test. Was this what you had in mind? Since it seemed like "cheating" to use the exact variances, I used the z-test with estimated variances which of course gives the t-test an advantage for small sample sizes. — jochen, Nov 02 '16 at 17:56
This is a great answer, only thing that could be improved is to set the seed value to 42. — Repmat, Nov 02 '16 at 18:31
@jochen: you did exactly what I meant. Thank you very much for running all the simulations. Like you said, it does seem like the student t test does have faster convergence than the z-test with the empirical variance, except for your last two examples. Very interesting. — Guillaume Dehaene, Nov 03 '16 at 11:36

Robustness of the Student t-test to non-Gaussian data

1 Answers1

Linked