Are binomial and McNemar's tests right for my data?

Question

I have performed an experiment, where I have real and synthesised movie data. My participants were shown either a real movie, or a synthesised movie (of talking heads), in a random order, and had to guess whether the movie was real or synthesised.

I got a binomial test result of 0.58 (using MATLAB binofit). Of 2040 samples, 1189 were guessed correctly which I think shows significant difference.

I also tried McNemar's test. The confusion matrix is:

[506 359]
[449 582]

where top left is 'said synth when synth', top right is 'said synth when real', bottom left is 'said real when synth' and bottom right is 'said real when real'.

So anyway, McNemar's test for that gives me 9.91, which again I think shows that there is a significant difference.

Are there any other tests I can try?

Maybe you are using wrong sample sizes? I doubt that you had 2000 participants accrued for this experiment. — Michael M, Apr 02 '14 at 11:29
No not 2040 participants. 17 Participants, each shown 120 movies. — shaw2thefloor, Apr 02 '14 at 12:08
Judgements of the same person are unlikely to be independent, thus violating key assumptions of your tests. I would use their results only with great care. — Michael M, Apr 02 '14 at 12:29
So since as you point out they are not independent samples, what would be a more appropriate test? — shaw2thefloor, Apr 02 '14 at 13:58

score 1 · Accepted Answer · edited Apr 13 '17 at 12:44

Neither of those tests is right for your data. McNemar's test is a within-subjects test of equality of proportions, but it is for data where you have only 2 measurements per subject. In addition, by using the proportion correct, it would not allow you to disentangle the ability to get the right answer from the tendency to respond. (If you want more information about McNemar's test, I discuss it rather thoroughly here: What is the difference between McNemar's test and the chi-squared test, and how do you know when to use each?) On the other hand, the binomial test assumes your data are independent, which, as @MichaelMayer points out, can't possibly be true of your data.

One approach might be to use signal detection theory. For each participant, the proportion of times the movie was real and the participant said 'real' is the hit rate ($h$); the proportion of times the movie was synthesized and they said 'real' is the false alarm rate ($fa$). With these two percentages, you can calculate the sensitivity index ($d'$) for each participant ($i$) by converting each proportion into a $z$-score and subtracting:
$$ d'_i = \Phi^{-1}(h_i) - \Phi^{-1}(fa_i) $$ Here $\Phi$ is the standard normal cumulative distribution function (CDF), and $\Phi^{-1}$ is its inverse. It's been a long time since I've used MATLAB, but I gather the function is norminv(). The resulting $d'$ scores are uncontaminated by each participant's bias to respond 'real'. With these scores, you can perform a one-sample $t$-test, which would allow you to test whether people can distinguish the real movies from the synthesized ones.

Thanks very much for your answer gung. I'll have a look at this test this evening. — shaw2thefloor, Apr 08 '14 at 18:13

Are binomial and McNemar's tests right for my data?

1 Answers1

Linked