3

I have $n$ individuals, and for each individual, I have two measurements using two devices (device X and device Y). I know the ground truth for the correct measurement, and I can classify each measurement as accurate or inaccurate. Thus, for each individual I effectively have a boolean value that indicates whether device X was correct or not (say $x_i$) and a boolean value that indicates whether device Y was correct or not (say $y_i$).

Is there a good statistical test to use to compare the accuracy rate of the two devices?

In particular, suppose I notice that device X's accuracy rate appears to be higher than device Y's accuracy rate, based upon the $n$ observations (i.e., $(x_1+\dots+x_n)/n > (y_1+\dots+y_n)/n$, where $x_i,y_i = 1$ means it was correct and $0$ means it was incorrect). Now I'd like to test whether the difference in observed accuracy rate is statistically significant. Can I compute a $p$-value for the null hypothesis that their underlying accuracy rate is actually the same?

Should I use the Wilcoxon signed-rank test? A paired Student's t-test? Some sort of paired Welch t-test (does such a thing even exist)? None of those seems like an obvious fit to me: I know the data isn't normally distributed (it presumably has a Bernoulli distribution), so a t-test isn't perfect (on the other hand I've read that in practice the t-test is fairly robust to deviations from normality so maybe it is OK?); and I can't tell whether a Wilcoxon signed-rank test takes into account the prior knowledge that the data is Bernoulli distributed. Anyway, what would be the most appropriate methodology?

D.W.
  • 5,892
  • 2
  • 39
  • 60
  • Sounds like you probably want [McNemar's test](http://en.wikipedia.org/wiki/McNemar%27s_test) – Glen_b May 31 '14 at 03:46
  • see also [here](http://stats.stackexchange.com/questions/73576/paired-t-test-for-binary-data?rq=1) – Glen_b May 31 '14 at 03:49

1 Answers1

3

McNemar's test solves this problem. (Thanks to Glen_b for mentioning this!) It is intended for paired data, where the observations are boolean -- a perfect fit. It is also easy to compute, which is convenient.

See also Paired t-test for binary data for another instance of a closely related statistical hypothesis testing problem.

D.W.
  • 5,892
  • 2
  • 39
  • 60