Testing the Central Limit Theorem with the Shapiro-Wilk test on dice rolling simulations

Question

Rolling a single dice repeatedly will result in a uniform distribution. But if we roll multiple dice the sum would be the Normal distribution due the Central Limit Theorem (CLT).

To verify this, simulated multiple dice rolls number of times to get a distribution and then used the Shapiro-Wilk test to calculate p-value.

What seems strange is that if we increase the number of rolls, the histogram becomes smoother and apparently closer to normal by p- value tends to decline until it stays below 0.05 for any number of repetitions.

Both Python and R gives the same results. Below are two pieces of code in Python. One plots p value vs. the number of rolls. The other repeats the experiment 1000 times and calculates the percentage of results supporting the null hypothesis with p > 0.05.

E.g., with 20 dice, the percentage for 100 rolls is about 94; 700 rolls - 75%, 1500 rolls - 11%, 2000 rolls - 0%.

How can we explain this phenomenon?

import numpy as np
import matplotlib.pyplot as plt
from scipy import stats

def shapiro_p(n_dice, n_throws):

    rng = np.random.default_rng()
    throwing = rng.integers(1, high=6, size=(n_throws, n_dice), endpoint=True)
    X = np.sum(throwing, axis=1) / n_dice
    shapiro_test = stats.shapiro(X)

    return shapiro_test.pvalue

n_dice = int(input('Input the number of dice '))
n_max = int(input('Input the number of rolls '))

x = [n for n in range(3, n_max + 1)]
y = [shapiro_p(n_dice, n) for n in range(3, n_max + 1)]

figure_title = 'Number of dice: {0}'.format(n_dice)

plt.plot(x, y)
plt.hlines(0.05, 3, n_max + 1, colors='red')
plt.xlabel('Number of rolls')
plt.ylabel('p')
plt.title(figure_title)
plt.grid(True)
plt.show()

The second piece of code:

from scipy import stats
import random

dice = [i for i in range(1, 7)]

def shapiro_p(n_dice, n_rolls):
    sampl = [sum(random.choices(dice, k=n_dice)) for j in range(n_rolls)]
    shapiro_test = stats.shapiro(sampl)
    return shapiro_test.pvalue

n_dice = int(input('Number of dice '))
n_rolls = int(input('Number of rolls in each test '))
# n_tests =  int(input('Number of tests '))
n_tests = 1000

null_h = 0
for n in range(n_tests):
    if shapiro_p(n_dice, n_rolls) >= 0.05:
        null_h += 1
print('Percentage of tests supporting the null hypothesis: ', 100*null_h/n_tests)

Thank you! It appears counterintuitive as histograms become smoother, but increasing power explains the results. I also see that increasing the number of dice delays the p-value fall off. — Denis Kazakov, Aug 25 '21 at 13:51
Aksakal, shapiro test in scipy tests for normality, not necessarily against the standard normal distribution. — Denis Kazakov, Sep 19 '21 at 15:25

COOLSerdash · Answer 1 · 2021-09-19T13:54:50.627

The CLT is an asymptotic result valid for $n\rightarrow \infty$. Namely, it says for iid random variables $X_i$ with mean $\mu$ and variance $\sigma^2<\infty$ with the following definition:

$$ S_n = \frac{1}{\sigma\sqrt{n}}\sum_{i=1}^{n}(X_i-\mu) = > \frac{\sqrt{n}(\bar{X}_n-\mu)}{\sigma} $$ Then $S_{n} \xrightarrow{d} Z\sim \operatorname{N}(0, 1)$ as $n\rightarrow \infty$.

The CLT says nothing about finite $n$ but the Berry-Esseen theorem gives bounds on the error of the approximation.

The distribution of the (appropriately scaled) sum is never exactly normally distributed at finite $n$, except when the $X_i$ have a normal distribution to begin with. For example: The support of the sum of $m$ d6 dice throws is $[m, 6m]$ whereas the normal distribution has support $(-\infty, \infty)$. So we'll expect the $p$-value of the Shapiro-Wilk test to increase with increasing number of dice, because the sum (or scaled sum) will approximate the normal distribution better.

The second effect that you observe is that the power of the Shapiro-Wilk test to reject deviations from a normal distribution increases with increasing number of rolls. Hence, with a fixed $m$ (number of dice), the $p$-value of the Shapiro-Wilk test will get smaller with increasing number of rolls. That's what you see in the graph resulting from the first piece of your Python code.

Thank you! I do not have enough reputation yet to upvote your answer, can only thank you in the comment. — Denis Kazakov, Aug 27 '21 at 20:48
I am not sure that I agree with that last sentence. Wouldn't the power of the test depend on how many times you draw samples of size $n$, rather than on $n$ itself? In the extreme, consider rolling a billion dice, but only once. How do you test the central limit theorem to the sum when you only have one sum? — Dave, Sep 09 '21 at 13:02
@Dave I think you're right. I used $n$ ambiguously for the sample size and for the number of dice. I will revise my answer. — COOLSerdash, Sep 09 '21 at 13:50
@COOLSerdash (+1) Great answer. Reference to Berry-Esseen theorem is not that well known. Nice that it is discussed here. — msuzen, Sep 19 '21 at 10:07

Testing the Central Limit Theorem with the Shapiro-Wilk test on dice rolling simulations

1 Answers1