I wrote a Python script generating a population of yes/no votes with 50% of the votes set randomly to yes.
Then I take samples of 50 votes many times (say 10,100,100,10000 and 100000) and test for each sample whether its confidence interval for a given confidence level of 95% contains the population proportion.
I expected the ratio between the number of CIs which contain and the number of all generated CIs to become closer and closer to 0.95, but here is what I get:
num of samples ratio
---------------- ----------
10 1.0
100 0.99
1000 0.95
10000 0.9366
100000 0.93337
1000000 0.935186
This is looks like it gets closer and closer to 0.935 or so.
Is this still likely correct or is there rather a bug in my program?
Some Details about my procedure:
I calculate the confidence interval CI from the confidence level cl like this
\begin{alignat*}{2} \text{CI}\; =\; \hat p\; &\pm\; &z^\star\:&\times\:\sqrt{\frac{\hat p \left(1 - \hat p\right)}{n}} \quad \text{with}\; z^\star \text{corresponding to a confidence level of}\; 95\% \end{alignat*}
or in Python code:
sigma_p_hat = math.sqrt(p_hat * (1 - p_hat) / n)
cdf = 0.5+cl/2
z_star = stats.norm.ppf(cdf)
E = z_star * sigma_p_hat
CI = pd.Interval(p_hat-E, p_hat+E, closed='both'