4

I know that I can get p-value from z-statistic (called zval here) using following function (Python):

pval = 2*(scipy.stats.norm.sf(abs(zval)))

or with:

pval = 2*(1 - scipy.stats.norm.cdf(abs(z)))

Where sf is survival function and cdf is Cumulative distribution function. Documentation of theses are here.

However, I am not clear that above method can be used for which other statistics. For example, can above be used for t-statistic in Student's t-test?

Some theory behind above concept will be much appreciated.

Christoph Hanck
  • 25,948
  • 3
  • 57
  • 106
rnso
  • 8,893
  • 14
  • 50
  • 94
  • 1
    I do not think this question should be closed. While it does depart from code, it clearly asks about theoretical development, which I think is firmly on topic here. – Christoph Hanck Jul 06 '20 at 14:07

1 Answers1

4

To construct a level $\alpha$ rejection region we first calculate the level $\alpha$ critical value $c_\alpha$. For a two-tailed test based on a test statistic that is $N(0,1)$ under $H_0$, the critical value is defined implicitly by \begin{equation}\label{deficritval}\tag{1} 1-\alpha/2=\Phi(c_\alpha) \end{equation} where $\Phi$ denotes the standard normal CDF. Hence, $$ \Phi^{-1}(1-\alpha/2)=c_\alpha $$ where $\Phi^{-1}$ denotes the quantile function.

The probability that $z > c_\alpha$ is $1 - (1 -\alpha/2) = \alpha/2$, and likewise, $P(z < -c_\alpha)=\alpha/2$, by symmetry. Thus, $P(|z| > c_\alpha)=\alpha$, as desired. For example, when $\alpha = 0.05$, $\Phi^{-1}(1-\alpha/2)= 1.96$.

The $p$-value is defined as the smallest level for which a test based on an obersved statistic $\hat{z}$ rejects.

For a two-tailed test, $$ p(\hat{z}) = 2(1- \Phi(|\hat{z}|)) $$ To see this, note that the test based on $\hat{z}$ rejects if $$|\hat{z}| > c_\alpha$$ This is equivalent to $$\Phi(|\hat{z}|) > \Phi(c_\alpha),$$ because $\Phi$ is strictly increasing. Further, from eq. \eqref{deficritval} $$ \Phi(c_\alpha)=1-\alpha/2 $$ The smallest value of $\alpha$ for which the inequality holds is thus obtained by solving the equation $$\Phi(|\hat{z}|) = 1-\alpha/2$$ for $\alpha$, which gives $2(1- \Phi(|\hat{z}|))$.

Hence, we require that the test statistic be $N(0,1)$ under the null and that we reject for both very negative and very positive values of the test statistic (i.e., conduct a two-tailed test).

Whether the result applies to Student's t-test therefore depends on the null distribution you entertain. If you can make a normality assumption on the data (see e.g. here to what that refers more precisely in a regression context) to which you apply the test, it is well known that the t-statistic follows a t-distribution. Hence, you would need to replace $\Phi$ with the corresponding c.d.f. of the t-distribution.

On the other hand, even without a normality assumption, the t-statistic will usually be normally distributed in large samples thanks to a central limit theorem. See e.g. here.

Christoph Hanck
  • 25,948
  • 3
  • 57
  • 106