40

Consider a scenario where you are provided with KnownLabel Matrix and PredictedLabel matrix. I would like to measure the goodness of the PredictedLabel matrix against the KnownLabel Matrix.

But the challenge here is that KnownLabel Matrix have few rows only one 1 and other few rows have many 1's (those instance are multi labeled). An example of KnownLabel Matrix is given below.

A =[1 0 0 0
    0 1 0 0
    0 1 1 0
    0 0 1 1
    0 1 1 1]

In the above matrix, data instance 1 and 2 are single label data, data instance 3 and 4 are two label data and data instance 5 is the three label data.

Now I have PredictedLabel Matrix of data instance using an algorithm.

I would like to know various measure which can be used to measure the goodness of the PredictedLabel Matrix against KnownLabel Matrix.

I can think of frobeinus norm difference between them as one of the measure. But im looking for the measure such as accuracy $(= \frac{\text{Correctly_predicted_instance}}{\text{total_instance}})$

Here how do can we define the $\rm Correctly\_predicted$ for multiple data instance?

Franck Dernoncourt
  • 42,093
  • 30
  • 155
  • 271
Learner
  • 4,007
  • 11
  • 37
  • 39
  • 5
    (+1) Sidenote: Is there a specific reason that you haven't accepted an answer in the majority of your questions ? Why didn't you post a comment when the provided answer did not solve your problem ? E.g.: http://stats.stackexchange.com/questions/9947/centroid-matching-problem – mlwida Jul 06 '11 at 11:44

3 Answers3

39

(1) gives a nice overview:

enter image description here

enter image description here

The Wikipedia page n multi-label classification contains a section on the evaluation metrics as well.

I would add a warning that in the multilabel setting, accuracy is ambiguous: it might either refer to the exact match ratio or the Hamming score (see this post). Unfortunately, many papers use the term "accuracy".


(1) Sorower, Mohammad S. "A literature survey on algorithms for multi-label learning." Oregon State University, Corvallis (2010).

Franck Dernoncourt
  • 42,093
  • 30
  • 155
  • 271
  • 2
    Do these definitions go against the general definitions for Precision and Recall? I have always read that precision should divide by TP+FP and recall should divide by TP+FN (the proposed definitions here do the opposite if I understood well). – tomasyany Jul 24 '17 at 14:56
  • No, the definitions in the paper are correct. Here $Y_i \in \mathcal{Y} = \{0, 1\}^k$ is a ground truth vector of labels for $i$th sample, and $Z_i = h(\mathbf{x}_i) = \{0, 1\}^k$ is a predicted set of labels as $h$ denotes a multi-label classifier. Perhaps, you've mistakenly mixed up $Y_i$ and $Z_i$ meanings. – constt Sep 07 '17 at 02:41
  • for the `accuracy` measure, how do you elegantly handle cases where the denominator `|Y + Z| == 0`? – ihadanny Jul 03 '18 at 05:57
  • 4
    @tomasyany is referring to the text definitions (not the formulas), which do appear to be switched around. – Narfanar Dec 28 '18 at 09:07
  • And this AP definition looks more like mAP (mean AP), no? What's referred to as 'Accuracy' is the average IoU. The terms are quite a bit confusing overall. – Narfanar Dec 28 '18 at 09:58
  • @Mussri For Average Precision vs. mean Average Precision: [Average Precision in Object Detection](https://stats.stackexchange.com/a/352798/12359) – Franck Dernoncourt Dec 28 '18 at 10:01
  • @FranckDernoncourt , That does support that the definition in the survey is actually for mAP. AP would be the individual sample score, and we get the mean of that. – Narfanar Dec 28 '18 at 12:05
  • @ihadanny , I(|Z| == 0) – Narfanar Dec 28 '18 at 12:06
  • Text descriptions of recall and precision are indeed wrong (although formulas seems ok). It makes me doubt about quality of this source... – hans Sep 10 '19 at 14:24
9

The Hamming Loss is probably the most widely used loss function in multi-label classification.

Have a look at Empirical Studies on Multi-label Classification and Multi-Label Classification: An Overview, both of which discuss this.

tdc
  • 7,289
  • 5
  • 32
  • 62
4

Correctly Predicted is the intersection between the set of suggested labels and the set expected one. Total Instances is the union of the sets above (no duplicate count).

So given a single example where you predict classes A, G, E and the test case has E, A, H, P as the correct ones you end up with Accuracy = Intersection{(A,G,E), (E,A,H,P)} / Union{(A,G,E), (E,A,H,P)} = 2 / 5