The tag info for McNemar's test (once supplied by me, and later possibly modified):
A repeated-measures test for categorical data. Given that two
variables with the same 2 categories (McNemar test) or k categories
(McNemar-Bowker test) form a square contingency table, the test's
question is whether population proportion in every off-diagonal cell
is equal to that in the symmetric cell. The 2x2
McNemar's test can be seen also as a "marginal homogeneity" test.
Please note first of all that McNemar's / McNemar-Bowker is generally not meant to be the test of marginal homogeneity. There exist a marginal homogeneity test for CxC
table which is not the same as McNemar-Bowker test of axial symmetry in CxC
table.
But in the specific case of 2x2
table the symmetry McNemar's test becomes also the test of marginal homogeneity. (To add, the above mentioned CxC
marginal homogeneity test is actually based internally on repeated application of McNemar's test to all possible 2x2
subtables). (To add yet another info: 2x2
McNemar's test is equivalent to Sign test performed on dichotomous data; and both can be made to return exact or asymptotic p-value.)
So, you are both right (in a specific case) and not so right (generally) saying that McNemar's test is a marginal homogeneity test. First of all, it is the axial symmetry test.
It is used in pre-post studies or match-pair studies to compare symmetric frequencies in the table; the row and the column categories must be same entities. The H0
is that in population all off-diagonal proportions are equal to their symmetric cell proportions vs H1
that at least one proportion differs from its symmetric one.
It is not strange therefore that an off-diagonal symmetry test ignores diagonal entries altogether.
But there is another repeated measures categorical test for CxC
contingency table with the same row/column categories - which does take the diagonal entries into account - the well-known Cohen's kappa statistic & test. Use it if you want to consider diagonal. But it tests different hypothesis: H0
= diagonal and off-diagonal proportions even vs H1
= off-diagonal proportions dominate (diagonal is canyon) or diagonal proportions dominate (ridge). Kappa does not consider specifically symmetric cells.
@ted's intuition about McNemar's
But the difference looks a lot like noise if the total number of
observations is 1MM
is misplaced. To repeat it: the diagonal entries in McNemar's (they would be called "ties", in terminology of Sign test) is conceptually outside of its test hypothesis. The hypothesis is about the binomial question "who wins statistically more often, A or B? or are they about draw by account?". So diagonal, ties, are treated as if "no answer, or don't know" response and thence the observations irrelevant to the experiment. They should be excluded from the sample at the time of testing. Despite they are irrelevant to the H0/H1
they are relevant to the test's power - since being excluded they diminish the effective sample size on which the test is based. Instead of exclusion, you might choose to randomly assign the ties under 1/2 probability either to "A wins" or "B wins", i.e. to treat ties as "chance lost data". This approach will not bias NcNemar's test but will weaken its power (see it).
But if you need to include the diagonal into your test concept (specifically, that under H0
there is even chance to fall into any off-diagonal as well as in any diagonal cell) - then McNemar's test shouldn't interest you. Choose kappa for example, or some other criterion/test. There are a number of them designed specifically to compare classification performances.
Comparing two classifiers is like comparing two rates. Inclusion an observation in a diagonal cell such as a
is an effective result of the works of the classifiers. Logically, it should be taken into account. As in kappa. But McNemar is primarily for repeated measures settings for the same set of observations. Those found themselves in cell a
just remained indifferent to the effects of the factor; and as long as the test issue is what is the direction of the effect whenever it exists - the cell can't help answer it.