1

Is there a test for testing the difference between two confusion matrices?

gung - Reinstate Monica
  • 132,789
  • 81
  • 357
  • 650
mamatv
  • 607
  • 2
  • 7
  • 15
  • 1
    If you have the possibility to do a step back, have you tried to compare the Area Under the Curve of your both outcome, first? – YCR Feb 03 '16 at 12:41
  • Yeah, I calculated AUC and the results is showed that, AUC of 1 confusion matrix is bigger. However, it is not much (0.91 versus 0.88), so I want a statistic test. – mamatv Feb 03 '16 at 12:57
  • I don't know if you can find a significant difference between the two matrices, but you might be able to find a significant difference between a metric in each matrix? For example, while you might not be able to conclude that two confusion matrices are different, you should be able to estimate recall (or precision or f1 or whatever) in each matrix and calculate a standard error of each, then conduct a hypothesis test to see if the two values are significantly different. – Matt Brems Feb 03 '16 at 13:00
  • Attempting to find a significant difference between the matrices themselves will not only prove tricky; it might also prove uninformative. A statistically significant difference might suggest that at least one value is different in the two matrices but provide no additional information as to which metric is/metrics are different. – Matt Brems Feb 03 '16 at 13:01
  • 1
    Do you have access to the underlying models w/ their predicted probabilities & the correct classifications? – gung - Reinstate Monica Feb 03 '16 at 13:03
  • @gung yes, I do. What should I do with this information? – mamatv Feb 03 '16 at 13:11
  • @MattBrems How can I calculate confidence interval for recall/precision/f1 for a confusion matrix? – mamatv Feb 03 '16 at 13:11
  • I'm not sure. I would Google that or attempt to derive it directly. To gung's point, it might make more sense to work with the original data rather than the summary statistics depending on what question you seek to answer. – Matt Brems Feb 03 '16 at 13:14
  • You could plot on the same graphic precision(threshold) depending on the threshold for the two models and do this for precision, recall, fall-out and False omission rate. It may give you ideas. – YCR Feb 03 '16 at 13:36
  • The fact that you can state whether a classification is correct or incorrect for every pattern according to 2 models means you have *matched pairs* of binary data. The appropriate test for this is McNemar's test. I have an answer explaining this here: [Compare classification performance of two heuristics](http://stats.stackexchange.com/a/185504/7290). – gung - Reinstate Monica Feb 03 '16 at 16:32
  • I will close this thread as a duplicate. If you still have any questions after reading that, come back here & edit your Q to state what you have learned & what you still need to understand. Then we can provide the information you need w/o simply duplicating material elsewhere that already didn't help you. – gung - Reinstate Monica Feb 03 '16 at 16:32

0 Answers0