0

Consider a study to examine whether food frequency questionnaires and three-day food diaries are equally likely to label a women as consuming less than the RDA of calcium. Because calcium intake can vary considerably from person to person, both instruments are used to evaluate a single sample of women. A sample of 117 women were evaluated and the results are as follows: (0: consuming less than RDA; 1: consuming at least RDA):

table

From the answer key: Null hypothesis: the classification by food frequency is independent of the classification by Diet Record. Based on the McNemar's test, the p-value=0.03 which is less than 0.05. We, therefore, reject the null hypothesis and conclude that the classification by food frequency is dependent of the classification by Diet Record.

I am confused about why these hypotheses can be written in terms of dependence/independence of the tests. I thought tests being dependent meant that if you get a positive result from one test, you are more likely to get a positive result from the other test, too. But couldn't this still be true under the null hypothesis? For example, if the tests had virtually the same sensitivity/rate of positive result, the cells for discordant pairs should be approximately equal (p=1/2) and the null not rejected. But the tests could still be dependent, because of the underlying disease/characteristic causing positive result in both tests. Can someone please explain where my thinking has gone wrong? Thank you!

mintypasta
  • 11
  • 2
  • Where is this example coming from? Where did you get the expectation that it measures dependency? I share your confusion, so understanding where the expectation arises from is necessary to address the question. See https://www.theanalysisfactor.com/difference-between-chi-square-test-and-mcnemar-test/. – ReneBt Apr 11 '20 at 01:57
  • I am taking a university intro to biostatistics course. This was a quiz question, and the expectation comes from the answer key. I copied that part directly from the answer key. – mintypasta Apr 11 '20 at 15:08

1 Answers1

0

what is independence?

The statistical definition of independence between two variables (e.g. two measures of an outcome of interest) is that the occurrence of one variable does not affect what is observed for the other. This means any test for independence must be able to assess whether the events in one variable affect the other.

I've not been able to find any useful sources that support with reasoning the claim McNemar's measures independence or dependence, in fact some sources directly refute this or simply give different explanations:

Chi squared test of independence vs McNemars

CV answers here

Comparision with several contingency table statistics

Independence vs McNemar's

The statistical definition of independence is one variable is not dependent or conditional on the status of another i.e. $P(X|Y)=P(X)$

but McNemar's evaluates the probability when each test disagreeing so is $P(X|(Y\neq X))$

This means McNemar's cannot be measuring dependence or independence since the probability is conditioned on both X and Y.

What does McNemar's test measure?

What the McNemar's test measures is extremely sensitive to the study design and must be carefully understood to understand what it is telling you.

In the confusion matrix you present the results are of paired tests carried out to predict the same binary outcome. There is no assessment of truth in this matrix, just agreement and disagreement. The McNemar's test reduces this further, focusing on only the disagreements.

The main diagonal for two accurate tests would be predominantly determined by the fact that both tests are good predictors of the outcome (i.e. they are co-dependent on the outcome of interest). An interesting question is is whether the noise elements of the two tests have any dependence separate from the agreement induced by the regression to the outcome. McNemar's is designed to ignore what is expected to be consistent to focus on what is not expected to be consistent so has an appeal here.

However, if the two tests are weak predictors of the outcome of interest then correct predictions come from both the signal in the model and also from random chance fluctuations in the noise that happen to give a correct result. The main diagonal reflects where the underlying co-dependencies plus noise coincide. In this case ignoring the main diagonal will not evaluate the random agreements due to any interaction in the noise when assessing dependence. For this reason I do not believe it can be claimed to assess dependence of test noise in the expected way.

The off-diagonal measures disagreements, so in a binary test this means one is right and one is wrong but the table cannot tell us which is which. A wrong result arises from noise, but a right result in strong performing models will be signal. This means we are not comparing noise in one test with noise in the other. What we are testing is noise in one test vs signal in the other, but blindfolded as to which is which.

Things are much messier in comparing poor models as a high proportion of 'correct' results will be flukes. This means we are comparing incorrect results with both signal and flukes.

What does it mean?

McNemar's is assessing whether there is any skew in how the tests disagree. This is a useful statistic to understand which test tends to skew more towards positive or negative results. My interpretation of the p-value being below your alpha level is that questionnaires predict compliance with RDA more than food diaries do.

ReneBt
  • 2,863
  • 1
  • 8
  • 24