This quote is from @Minkoo Seo's answer to How to interpret F-measure values?:
I cannot think of an intuitive meaning of the F measure, because it's just a combined metric. What's more intuitive than F-me[a]sure, of course, is precision and recall.
But using two values, we often cannot determine if one algorithm is superior to another.
Taking this answer, we can conclude that a significance test of F-measures makes no sense.
In terms of selecting your method though, you need to analyse precision, recall and parsimony of the classifier, which is a judgement call.