I am looking to simulate the t-distribution from first principles.
In particular, I want to understand how the distribution arises by comparing the mean of sample A of size $n_A$ (taken from population A) with the mean of sample B of size $n_B$ (from a distinct population B). Note that $n_A$ and $n_B$ are not necessarily equal.
The null hypothesis ($H_0$) of a t-test (independent samples) states that both samples come from the same population. I'm interpreting this as population A essentially being the same as population B.
To simulate the t-distribution, this is my plan (Python):
create a large (N=10000) array of normally distributed values with a mean $10$ and standard deviation, $2$. It will be one population because the null hypothesis assumes that the samples indeed come from the same underlying population.
iterate 1000 times as follows:
- take a random sample of 20 elements (sample A), and another 10 elements (sample B) from the underlying population
- calculate the t-statistic for this realisation of samples
- record the t-statistic from each i$^\mathrm{th}$ iteration
plot a histogram of all 1000 t-statistic scores
However, point (2b) is where I have difficulty - what is the equation to calculate the t-statisitc? I have found various resources on the interweb (re-arranged slightly), but they don't appear to be entirely consistent.
Shoffma5 (slide 16) $$t = \frac{\mu_A - \mu_B}{\sqrt{ \frac{1/n_A+1/n_B}{\nu} }}\frac{1}{\sqrt{ s_A^2\big(n_A-1\big) + s_B^2\big(n_B-1\big) }}$$
ucdavis and statisticshowto $$t = \frac{\mu_A - \mu_B}{\sqrt{ \frac{1/n_A+1/n_B}{\nu}}}\frac{1}{\sqrt{ \Big(\sum A^2 - \frac{(\sum A)^2}{n_A}\Big)^2 + \Big(\sum B^2 - \frac{(\sum B)^2}{n_B}\Big)^2 }}$$
where $\mu_A$ and $\mu_B$ are the respective means of samples A and B, and $s_A^2$ and $s_B^2$ are the respective variances.
What is the correct equation to use to calculate the t-statistic (independent samples)?