I'm trying to understand the t-test, and don't quite understand the differences I'm seeing between t-statistic and number of standard deviations between means for a Gaussian random variable, if the number of samples is large.
Let's say I have two populations $x_1$ and $x_2$ each with an assumed normal distribution of roughly equal variance, and I obtain $N=100$ samples from each population, where the sample statistics are $\bar{x}_1=10.44, \bar{x}_2=12.01, \sigma_{x1} = 1.33, \sigma_{x2} = 1.49$.
I want to know my confidence level that the population mean of $x_2$ is greater than the population mean of $x_1$.
Applying the t-test gives me
$$ t=\frac{\bar{x}_2-\bar{x}_1}{\sqrt{\frac{\sigma_{x1}{}^2+\sigma_{x2}{}^2}{N}}} = 7.860816 $$
I'm computing a probability of about $10^{-13}$ from using scipy's survival function scipy.stats.t.sf
(= 1 - CDF):
>>> import scipy.stats
>>> scipy.stats.t.sf(7.860816, 198)
1.1995112158126528e-13
It looks like $t$ could also be considered a normalized version of $v = x_2 - x_1$ where $\bar{v} = \bar{x}_2 - \bar{x}_1$ and $\sigma_{v} = \sqrt{\sigma_1{}^2+\sigma_2{}^2}$:
$$t = \frac{\bar{v}}{\sigma_{v}\sqrt{N}}$$
namely that $t$ is equivalent to the random variable $v = x_2 - x_1$ normalized by its standard deviation and by the square root of the number of samples, so that I would expect $\sigma_t \approx 1$ if I ran lots of these tests. (This neglects the fact that the sample statistics have variation, which I think is the difference.)
And the probability of $t<0$ (my hypothesis about the relative values of the population means) should therefore be around the ballpark of scipy.norm.sf(7.860816)
:
>>> scipy.stats.norm.sf(7.860816)
1.9081952314663672e-15
But this is about a factor of 63 smaller than the probability computed from the t-distribution. Is this really supposed to be that big a difference for large $N$?