3

I have a hierarchical multi-class classification system, that classifies records into about 500 different categories. I want to summarise the performance of the classifier in a simple way.

A measure of accuracy on validation data is easy to implement: correctly coded/all coded. For each class, we can look at binary measures of precision and recall to summarise the performance relative to that class.

However, there doesn't seem to be a generally accepted way to combine binary precision and recalls into summaries of precision and recall across the entire set of classes. There appear to be a few ways to approach this summary:

  1. Take a simple average (arithmetic/geometric/harmonic) of each class's precision/recall.

  2. Take a weighted average (weighted by number of examples, etc) of each class's precision/recall.

  3. Use bookmaker's informedness/markedness which seems to have a natural generalisation in the multiclass context.

Are there advantages to using one of these approaches particularly? Is there a generally accepted way to do this that I've just been missing?

RoryT
  • 753
  • 5
  • 13
  • Potential duplicate of https://stats.stackexchange.com/questions/51296/how-do-you-calculate-precision-and-recall-for-multiclass-classification-using-co – Brandmaier Jun 30 '17 at 07:37
  • @Brandmaier Thanks for your comment. That's not really the same question - the question there is about computing the binary precision/recalls, which I'm comfortable with. I'm asking for good practice in summarising all of the binary precisions/recalls into a single measure over all of the classes. – RoryT Jul 03 '17 at 00:23

1 Answers1

2

As far as I know there isn't a "de facto" way of calculating precision and recall for multi-class classification.

Your approaches are what I too would try:

  1. Class-wise harmonic mean.
  2. Class-wise weighted harmonic mean (if the classes are imbalanced). With a weight equal to the class imbalance (i.e. class weight = number of class examples / number of total examples)
  3. Class-wise geometric mean (another approach if the classes are imbalanced).

There are also other metrics to evaluate the performance of your mode, besides precision and recall:

Djib2011
  • 5,395
  • 5
  • 25
  • 36