First of all, I would suggest presenting confidence intervals instead of $p$-values. That said, I think the presented $z$ values are a bit off. Using $Z = \frac{(p_b - p_a)}{ \sqrt{p_p (1-p_p) (\frac{1}{n_a} + \frac{1}{n_b})}}$, where $p_p$ is the pooled proportion and $p_a$ and $p_b$ are the proportions observed in samples $a$ and $b$ respectively, we get $Z= 1.293905$. Calculating $Z$ using this pooled statitic can make it mode natural for one-sided hypothesis testing as here. This is because we "strongly suspect" that any deviations will be positive.
gr_a_vt = 268886
gr_b_vt = 1119090
gr_a_or = 2312
gr_b_or = 9913
p_a = gr_a_or / gr_a_vt # 0.008598439
p_b = gr_b_or / gr_b_vt # 0.00885809
p = (gr_a_or + gr_b_or) / (gr_a_vt + gr_b_vt) # 0.008807789
Z = (p_b-p_a)/(sqrt(p*(1-p)*((1/gr_a_vt)+(1/gr_b_vt)))) # 1.293905
This is in accordance with what we would get if we used R's prop.test
to test our given proportions:
res = prop.test(x=c(gr_b_or, gr_a_or), n=c(gr_b_vt, gr_a_vt),
alternative = "greater", correct = FALSE)
# 2-sample test for equality of proportions without continuity correction
#
# data: c(gr_b_or, gr_a_or) out of c(gr_b_vt, gr_a_vt)
# X-squared = 1.6742, df = 1, p-value = 0.09785
# alternative hypothesis: greater
# 95 percent confidence interval:
# -6.745774e-05 1.000000e+00
# sample estimates:
# prop 1 prop 2
# 0.008858090 0.008598439
where the $Z$ values match our manual computation:
all.equal(Z, sqrt(res$statistic), check.attributes = FALSE) # TRUE # 1.293...
and to that effect the $p$-values do too:
all.equal(res$p.value, 1-pnorm(Z)) # TRUE # 0.09784...
Given the abudance of modern statistical software I would urge you not to use a manual formula. Aside, R's pnorm
, one can use Python's scipy.stats.norm.cdf
, Excel's NORM.DIST
, MATLAB's normcdf
, etc. As @whuber mentioned, CV.SE has already a great thread on how to "Evaluate definite interval of normal distribution". There are a number of approximation listed. Given we use Borjesson & Sundberg's approximation as reported in cardinal's answer we get:
((2*pi)^(-0.5) * exp(-Z^2/2)) / ((1-0.339)*Z + 0.339*sqrt(Z^2+5.51)) # 0.09792396
which is a reasonable approximation to our "true" $p$-value of $0.09784906$.
Finally as mentioned in the beginning, showing confidence intervals is probably quite informative to reason about potential differences. A first treatment would be to use the "Wald" approximate confidence interval for $p_a - p_b$ and get $(p_b-p_a) \pm z_{\alpha/2} \sqrt{ \frac{p_a (1-p_a)}{n_a} + \frac{p_b (1-p_b)}{n_b}}$ where $z_{\alpha}$ denotes the $1-\alpha$ quantile of the normal approximation. Using for example $\alpha=0.05$, i.e. 95% CI we get:
(p_b-p_a) + c(-1.96,1.96) * sqrt(p_b*(1-p_b)/gr_b_vt + p_a*(1-p_a)/gr_a_vt)
# -0.0001301302 0.0006494313
which better conveys that most probably we get a very small but positive difference (here $ z_{\alpha/2} = 1.96$). This approximation and other handy ones can be found in Agresti & Caffo (2000) "Simple and Effective Confidence Intervals for Proportions and Differences of Proportions Result from Adding Two Successes and Two Failure". Notice that this is a two-sided confidence interval, the one we got by prop.test
above is one-sided because we simply care to check if the $p_b$ is greater than $p_a$.
In general, one-sided confidence intervals are a bit harder to explain to non-mathematicians so I would not invest the time showing them.
That said, if we wanted to recreate the manually computed CIs shown above through prop.test
, we could use:
prop.test(x=c(gr_b_or, gr_a_or), n=c(gr_b_vt, gr_a_vt),
correct = FALSE, conf.level = 0.95)
# 2-sample test for equality of proportions without continuity correction
#
# data: c(gr_b_or, gr_a_or) out of c(gr_b_vt, gr_a_vt)
# X-squared = 1.6742, df = 1, p-value = 0.1957
# alternative hypothesis: two.sided
# 95 percent confidence interval:
# -0.0001301230 0.0006494242
# sample estimates:
# prop 1 prop 2
# 0.008858090 0.008598439