Consider the following simple python
code:
import numpy as np
import scipy.stats as stats
np.random.seed(1000)#(seed)
for i in range(10):
X = np.random.normal(0,1,2000000)
Y = np.random.normal(0,1,2000000)
print(stats.ttest_ind(X,Y))
Since both $X$ and $Y$ are sampled from the same normal distribution (same mean and same variance) I expected that the returned p-values would be rather high. However, in reality, on my machine, I get the following output:
(-0.63504239247902361, 0.52540080331521977)
(0.95766169154615455, 0.33823343778490922)
(0.77495702809122613, 0.43836509803793244)
(1.0889385249318819, 0.27618106157897093)
(0.97293351367380154, 0.33058640601636757)
(1.3556340433654583, 0.17521571107906755)
(0.58921373947134548, 0.55571793482831677)
(0.91391361607063426, 0.36076229350915234)
(0.62066063893057222, 0.5348229679159715)
(2.3510055029223147, 0.018722805219522423)
Notice that for $\alpha=0.05$ the last pair of samples would suggest that I should reject the null hypothesis, namely that $\mu_X = \mu_Y$.
As a sanity check I ran,
stats.ttest_ind(X,X)
which yields (0.0, 1.0)
as expected. Furthermore, I ran also the following:
import numpy as np
import scipy.stats as stats
np.random.seed(1000)#(seed)
for i in range(10):
X = np.random.normal(0,1,2000000)
Y = np.random.normal(10,1,2000000) # Different mean for Y
print(stats.ttest_ind(X,Y))
which yielded always zero p-values, as I expected.
The question: What is happening here? Why is this the behavior and what am I missing?