Reconciling the differences in tail probability between t-test and normal distribution

Question

I'm trying to understand the t-test, and don't quite understand the differences I'm seeing between t-statistic and number of standard deviations between means for a Gaussian random variable, if the number of samples is large.

Let's say I have two populations $x_1$ and $x_2$ each with an assumed normal distribution of roughly equal variance, and I obtain $N=100$ samples from each population, where the sample statistics are $\bar{x}_1=10.44, \bar{x}_2=12.01, \sigma_{x1} = 1.33, \sigma_{x2} = 1.49$.

I want to know my confidence level that the population mean of $x_2$ is greater than the population mean of $x_1$.

Applying the t-test gives me

$$ t=\frac{\bar{x}_2-\bar{x}_1}{\sqrt{\frac{\sigma_{x1}{}^2+\sigma_{x2}{}^2}{N}}} = 7.860816 $$

I'm computing a probability of about $10^{-13}$ from using scipy's survival function scipy.stats.t.sf (= 1 - CDF):

>>> import scipy.stats
>>> scipy.stats.t.sf(7.860816, 198)
1.1995112158126528e-13

It looks like $t$ could also be considered a normalized version of $v = x_2 - x_1$ where $\bar{v} = \bar{x}_2 - \bar{x}_1$ and $\sigma_{v} = \sqrt{\sigma_1{}^2+\sigma_2{}^2}$:

$$t = \frac{\bar{v}}{\sigma_{v}\sqrt{N}}$$

namely that $t$ is equivalent to the random variable $v = x_2 - x_1$ normalized by its standard deviation and by the square root of the number of samples, so that I would expect $\sigma_t \approx 1$ if I ran lots of these tests. (This neglects the fact that the sample statistics have variation, which I think is the difference.)

And the probability of $t<0$ (my hypothesis about the relative values of the population means) should therefore be around the ballpark of scipy.norm.sf(7.860816):

>>> scipy.stats.norm.sf(7.860816)
1.9081952314663672e-15

But this is about a factor of 63 smaller than the probability computed from the t-distribution. Is this really supposed to be that big a difference for large $N$?

You're way out in the tail here; this much of a difference is no surprise. — whuber, Apr 19 '21 at 23:04
well, I'm still learning about the t-test, so I don't know what to expect — Jason S, Apr 19 '21 at 23:08
oh, I see what you mean, for t=2.860816 the sf values are 0.002113 for scipy.stats.norm and 0.002340 for scipy.stats.t — Jason S, Apr 19 '21 at 23:16

Reconciling the differences in tail probability between t-test and normal distribution

0 Answers0