1

I have performed an experiment, where I have real and synthesised movie data. My participants were shown either a real movie, or a synthesised movie (of talking heads), in a random order, and had to guess whether the movie was real or synthesised.

I got a binomial test result of 0.58 (using MATLAB binofit). Of 2040 samples, 1189 were guessed correctly which I think shows significant difference.

I also tried McNemar's test. The confusion matrix is:

[506 359]
[449 582]

where top left is 'said synth when synth', top right is 'said synth when real', bottom left is 'said real when synth' and bottom right is 'said real when real'.

So anyway, McNemar's test for that gives me 9.91, which again I think shows that there is a significant difference.

Are there any other tests I can try?

gung - Reinstate Monica
  • 132,789
  • 81
  • 357
  • 650

1 Answers1

1

Neither of those tests is right for your data. McNemar's test is a within-subjects test of equality of proportions, but it is for data where you have only 2 measurements per subject. In addition, by using the proportion correct, it would not allow you to disentangle the ability to get the right answer from the tendency to respond. (If you want more information about McNemar's test, I discuss it rather thoroughly here: What is the difference between McNemar's test and the chi-squared test, and how do you know when to use each?) On the other hand, the binomial test assumes your data are independent, which, as @MichaelMayer points out, can't possibly be true of your data.

One approach might be to use signal detection theory. For each participant, the proportion of times the movie was real and the participant said 'real' is the hit rate ($h$); the proportion of times the movie was synthesized and they said 'real' is the false alarm rate ($fa$). With these two percentages, you can calculate the sensitivity index ($d'$) for each participant ($i$) by converting each proportion into a $z$-score and subtracting:
$$ d'_i = \Phi^{-1}(h_i) - \Phi^{-1}(fa_i) $$ Here $\Phi$ is the standard normal cumulative distribution function (CDF), and $\Phi^{-1}$ is its inverse. It's been a long time since I've used MATLAB, but I gather the function is norminv(). The resulting $d'$ scores are uncontaminated by each participant's bias to respond 'real'. With these scores, you can perform a one-sample $t$-test, which would allow you to test whether people can distinguish the real movies from the synthesized ones.

gung - Reinstate Monica
  • 132,789
  • 81
  • 357
  • 650