I am trying to use the McNemar-Bowker test to test the difference in performance for 2 classifiers. Since my input matrix is sparse and the sum of some of the symmetric cells is less than 10, I am trying to use the exact McNemar-Bowker test using nominalSymmetryTest
in this way:
data <- c( 0,0,0,0,0,0,0,0,0,0,
23,253,35,0,0,0,0,0,0,0,
9,299,1510,329,7,0,0,0,0,0,
0,1,289,1193,136,3,0,0,0,0,
0,0,35,403,4437,338,1,0,0,0,
0,0,0,15,70,692,114,7,1,0,
0,0,0,0,3,50,87,18,0,0,
0,0,0,0,1,14,57,35,15,1,
0,0,0,0,2,2,16,12,1,0,
0,0,0,1,0,3,31,33,12,3)
rownames <- c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10")
colnames <- c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10")
ada_cat <- matrix(data, nrow = 10, ncol = 10, byrow = TRUE)
#dimnames=list(rownames,colnames))
nominalSymmetryTest(ada_cat,
digits = 3,
MonteCarlo = TRUE,
exact = TRUE,
ntrial = 100000)
The results look like this:
$Global.test.for.symmetry
Dimensions p.value
1 10 x 10 NA
$Pairwise.symmetry.tests
Comparison p.value p.adjust
1 1/1 : 2/2 2.38e-07 9.12e-07
2 1/1 : 3/3 0.00391 7.49e-03
3 1/1 : 4/4 <NA> NA
4 1/1 : 5/5 <NA> NA
5 1/1 : 6/6 <NA> NA
6 1/1 : 7/7 <NA> NA
7 1/1 : 8/8 <NA> NA
8 1/1 : 9/9 <NA> NA
9 1/1 : 10/10 <NA> NA
10 2/2 : 3/3 2.12e-53 4.88e-52
11 2/2 : 4/4 1 1.00e+00
12 2/2 : 5/5 <NA> NA
13 2/2 : 6/6 <NA> NA
14 2/2 : 7/7 <NA> NA
15 2/2 : 8/8 <NA> NA
16 2/2 : 9/9 <NA> NA
17 2/2 : 10/10 <NA> NA
18 3/3 : 4/4 0.117 1.92e-01
19 3/3 : 5/5 1.51e-05 3.86e-05
20 3/3 : 6/6 <NA> NA
21 3/3 : 7/7 <NA> NA
22 3/3 : 8/8 <NA> NA
23 3/3 : 9/9 <NA> NA
24 3/3 : 10/10 <NA> NA
25 4/4 : 5/5 1.11e-31 8.51e-31
26 4/4 : 6/6 0.00754 1.33e-02
27 4/4 : 7/7 <NA> NA
28 4/4 : 8/8 <NA> NA
29 4/4 : 9/9 <NA> NA
30 4/4 : 10/10 1 1.00e+00
31 5/5 : 6/6 3.3e-43 3.80e-42
32 5/5 : 7/7 0.625 7.99e-01
33 5/5 : 8/8 1 1.00e+00
34 5/5 : 9/9 0.5 6.76e-01
35 5/5 : 10/10 <NA> NA
36 6/6 : 7/7 6.33e-07 2.08e-06
37 6/6 : 8/8 0.189 2.90e-01
38 6/6 : 9/9 1 1.00e+00
39 6/6 : 10/10 0.25 3.59e-01
40 7/7 : 8/8 7.24e-06 2.08e-05
41 7/7 : 9/9 3.05e-05 7.02e-05
42 7/7 : 10/10 9.31e-10 5.35e-09
43 8/8 : 9/9 0.701 8.49e-01
44 8/8 : 10/10 4.07e-09 1.87e-08
45 9/9 : 10/10 0.000488 1.02e-03
$p.adjustment
Method
1 fdr
$statistical.method
Method
1 binomial test
I am having difficulties in understanding them. Can I conclude from these results that the difference in classifier performance is statistically significant/non-significant?
Thank you!