How to compare the F-measure values?

Question

I want to compare the f-measure of method 1 with the result of 3 other methods. F-measure of M1: 0.800

F-measure of M2: 0.630 F-measure of M3: 0.619 F-measure of M4: 0.612

how can I say that the M1 significantly improved the result in compare with other methods?

score 1 · Answer 1 · edited Apr 13 '17 at 12:44

This quote is from @Minkoo Seo's answer to How to interpret F-measure values?:

I cannot think of an intuitive meaning of the F measure, because it's just a combined metric. What's more intuitive than F-me[a]sure, of course, is precision and recall.

But using two values, we often cannot determine if one algorithm is superior to another.

Taking this answer, we can conclude that a significance test of F-measures makes no sense.

In terms of selecting your method though, you need to analyse precision, recall and parsimony of the classifier, which is a judgement call.

score 1 · Answer 2 · answered Oct 14 '16 at 13:20

To say that M1 significantly improved the result in compare with other methods, you must have to use statistical methods and tell that the measure or difference is statistically significant.

To do this, you may run the model generation process multiple times (may be a 100 times using for eg. different partitioning of training and testing data) and get their F-scores. Use ANNOVA to statistically compare the F-scores of the four methods.

How to compare the F-measure values?

2 Answers2