1

I am trying to build P-value calculation within Tableau for A/B tests and found a formula. But the formula doesn't work all the time.

Group A:
Visits - 268,886
Clicks - 38,882
Orders - 2,312

Group B:
Visits - 1,119,090
Clicks - 142,239
Orders - 9,913

CTR = Clicks/Visits
CVR = Orders/Visits

I am trying to calculate the p-value of the CTR and CVR difference between group A and group B.

I was able to get the z-score correctly.

CTR Z-score: 23.41
CVR Z-score: 1.31

And I found a formula for calculating p-value by using z-score:

1/(1+exp(-sqrt(PI())*(-0.0004406*[Zscore]^5+0.0418198*[Zscore]^3+.9*[Zscore])))

CTR p-value: 2 (doesn't seem right; it should be 0)

CVR p-value:0.19 (it seems correct)

I am wondering if the formula above is correct or not. If it's not right, is there any other formula that I can use?

Glen_b
  • 257,508
  • 32
  • 553
  • 939
  • This formula is an approximation and likely is intended only for a narrow range of Z-values, which typically would run from about $2$ to $6$ in size. For (much) better approximations see our thread at https://stats.stackexchange.com/questions/7200/evaluate-definite-interval-of-normal-distribution/7206#7206. In your case, no calculations are needed: the 68-95-99.7 rule shows a Z-score of 1.31 has a high p-value; and any Z-score whose size is larger than 6 is going to be significant. 23.41 will give an astronomically small p-value. – whuber Dec 30 '19 at 15:36
  • It's helpful to let software calculate the *p* value for you. If you considering a two-sided test, you could use the following R code at [this site](https://rdrr.io/snippets/). If you are using software like Excel, it should also have a function to derive the *p* value from the *z* value. `z = 1.31; 2*pnorm(-abs(z))` – Sal Mangiafico Dec 30 '19 at 16:58

1 Answers1

1

First of all, I would suggest presenting confidence intervals instead of $p$-values. That said, I think the presented $z$ values are a bit off. Using $Z = \frac{(p_b - p_a)}{ \sqrt{p_p (1-p_p) (\frac{1}{n_a} + \frac{1}{n_b})}}$, where $p_p$ is the pooled proportion and $p_a$ and $p_b$ are the proportions observed in samples $a$ and $b$ respectively, we get $Z= 1.293905$. Calculating $Z$ using this pooled statitic can make it mode natural for one-sided hypothesis testing as here. This is because we "strongly suspect" that any deviations will be positive.

gr_a_vt = 268886
gr_b_vt = 1119090
gr_a_or = 2312
gr_b_or = 9913

p_a = gr_a_or / gr_a_vt # 0.008598439
p_b = gr_b_or / gr_b_vt # 0.00885809
p = (gr_a_or + gr_b_or) / (gr_a_vt +  gr_b_vt) # 0.008807789

Z = (p_b-p_a)/(sqrt(p*(1-p)*((1/gr_a_vt)+(1/gr_b_vt)))) # 1.293905

This is in accordance with what we would get if we used R's prop.test to test our given proportions:

res = prop.test(x=c(gr_b_or, gr_a_or), n=c(gr_b_vt, gr_a_vt), 
                alternative = "greater", correct = FALSE)

#   2-sample test for equality of proportions without continuity correction
#
#   data:  c(gr_b_or, gr_a_or) out of c(gr_b_vt, gr_a_vt)
#   X-squared = 1.6742, df = 1, p-value = 0.09785
#   alternative hypothesis: greater
#   95 percent confidence interval:
#     -6.745774e-05  1.000000e+00
#   sample estimates:
#        prop 1      prop 2 
#   0.008858090 0.008598439

where the $Z$ values match our manual computation:

all.equal(Z, sqrt(res$statistic), check.attributes = FALSE) # TRUE # 1.293...

and to that effect the $p$-values do too:

all.equal(res$p.value, 1-pnorm(Z)) # TRUE # 0.09784...

Given the abudance of modern statistical software I would urge you not to use a manual formula. Aside, R's pnorm, one can use Python's scipy.stats.norm.cdf, Excel's NORM.DIST, MATLAB's normcdf, etc. As @whuber mentioned, CV.SE has already a great thread on how to "Evaluate definite interval of normal distribution". There are a number of approximation listed. Given we use Borjesson & Sundberg's approximation as reported in cardinal's answer we get:

((2*pi)^(-0.5) * exp(-Z^2/2)) / ((1-0.339)*Z + 0.339*sqrt(Z^2+5.51)) # 0.09792396

which is a reasonable approximation to our "true" $p$-value of $0.09784906$.

Finally as mentioned in the beginning, showing confidence intervals is probably quite informative to reason about potential differences. A first treatment would be to use the "Wald" approximate confidence interval for $p_a - p_b$ and get $(p_b-p_a) \pm z_{\alpha/2} \sqrt{ \frac{p_a (1-p_a)}{n_a} + \frac{p_b (1-p_b)}{n_b}}$ where $z_{\alpha}$ denotes the $1-\alpha$ quantile of the normal approximation. Using for example $\alpha=0.05$, i.e. 95% CI we get:

(p_b-p_a) + c(-1.96,1.96) * sqrt(p_b*(1-p_b)/gr_b_vt + p_a*(1-p_a)/gr_a_vt)
# -0.0001301302  0.0006494313

which better conveys that most probably we get a very small but positive difference (here $ z_{\alpha/2} = 1.96$). This approximation and other handy ones can be found in Agresti & Caffo (2000) "Simple and Effective Confidence Intervals for Proportions and Differences of Proportions Result from Adding Two Successes and Two Failure". Notice that this is a two-sided confidence interval, the one we got by prop.test above is one-sided because we simply care to check if the $p_b$ is greater than $p_a$. In general, one-sided confidence intervals are a bit harder to explain to non-mathematicians so I would not invest the time showing them. That said, if we wanted to recreate the manually computed CIs shown above through prop.test, we could use:

prop.test(x=c(gr_b_or, gr_a_or), n=c(gr_b_vt, gr_a_vt), 
          correct = FALSE, conf.level = 0.95)

#   2-sample test for equality of proportions without continuity correction
#
#   data:  c(gr_b_or, gr_a_or) out of c(gr_b_vt, gr_a_vt)
#   X-squared = 1.6742, df = 1, p-value = 0.1957
#   alternative hypothesis: two.sided
#   95 percent confidence interval:
#    -0.0001301230  0.0006494242
#   sample estimates:
#        prop 1      prop 2 
#   0.008858090 0.008598439 
usεr11852
  • 33,608
  • 2
  • 75
  • 117