Since accuracy, in this case, is the proportion of samples correctly classified, we can apply the test of hypothesis concerning a system of two proportions.
Let $\hat p_1$ and $\hat p_2$ be the accuracies obtained from classifiers 1 and 2 respectively, and $n$ be the number of samples. The number of samples correctly classified in classifiers 1 and 2 are $x_1$ and $x_2$ respectively.
$ \hat p_1 = x_1/n,\quad \hat p_2 = x_2/n$
The test statistic is given by
$\displaystyle Z = \frac{\hat p_1 - \hat p_2}{\sqrt{2\hat p(1 -\hat p)/n}}\qquad$ where $\quad\hat p= (x_1+x_2)/2n$
Our intention is to prove that the global accuracy of classifier 2, i.e., $p_2$, is better than that of classifier 1, which is $p_1$. This frames our hypothesis as
- $H_0: p_1 = p_2\quad$ (null hypothesis stating both are equal)
- $H_a: p_1 < p_2\quad$ (alternative hypotyesis claiming the newer one is better than the existing)
The rejection region is given by
$Z < -z_\alpha \quad$ (if true reject $H_0$ and accept $H_a$)
where $z_\alpha$ is obtained from a standard normal distribition that pertains to a level of significance, $\alpha$. For instance $z_{0.5} = 1.645$ for 5% level of significance. This means that if the relation $Z < -1.645$ is true, then we could say with 95% confidence level ($1-\alpha$) that classifier 2 is more accurate than classifier 1.
References:
- R. Johnson and J. Freund, Miller and Freund’s Probability and Statistics
for Engineers, 8th Ed. Prentice Hall International, 2011. (Primary source)
- Test of Hypothesis-Concise Formula Summary. (Adopted from [1])