What are the measure for accuracy of multilabel data?

Question

Consider a scenario where you are provided with KnownLabel Matrix and PredictedLabel matrix. I would like to measure the goodness of the PredictedLabel matrix against the KnownLabel Matrix.

But the challenge here is that KnownLabel Matrix have few rows only one 1 and other few rows have many 1's (those instance are multi labeled). An example of KnownLabel Matrix is given below.

A =[1 0 0 0
    0 1 0 0
    0 1 1 0
    0 0 1 1
    0 1 1 1]

In the above matrix, data instance 1 and 2 are single label data, data instance 3 and 4 are two label data and data instance 5 is the three label data.

Now I have PredictedLabel Matrix of data instance using an algorithm.

I would like to know various measure which can be used to measure the goodness of the PredictedLabel Matrix against KnownLabel Matrix.

I can think of frobeinus norm difference between them as one of the measure. But im looking for the measure such as accuracy $(= \frac{\text{Correctly_predicted_instance}}{\text{total_instance}})$

Here how do can we define the $\rm Correctly\_predicted$ for multiple data instance?

(+1) Sidenote: Is there a specific reason that you haven't accepted an answer in the majority of your questions ? Why didn't you post a comment when the provided answer did not solve your problem ? E.g.: http://stats.stackexchange.com/questions/9947/centroid-matching-problem — mlwida, Jul 06 '11 at 11:44

score 39 · Answer 1 · edited May 23 '17 at 12:39

39

(1) gives a nice overview:

The Wikipedia page n multi-label classification contains a section on the evaluation metrics as well.

I would add a warning that in the multilabel setting, accuracy is ambiguous: it might either refer to the exact match ratio or the Hamming score (see this post). Unfortunately, many papers use the term "accuracy".

(1) Sorower, Mohammad S. "A literature survey on algorithms for multi-label learning." Oregon State University, Corvallis (2010).

edited May 23 '17 at 12:39

Community

1

answered Aug 26 '15 at 21:14

Franck Dernoncourt

42,093
30
155
271

2

Do these definitions go against the general definitions for Precision and Recall? I have always read that precision should divide by TP+FP and recall should divide by TP+FN (the proposed definitions here do the opposite if I understood well). – tomasyany Jul 24 '17 at 14:56
No, the definitions in the paper are correct. Here $Y_i \in \mathcal{Y} = \{0, 1\}^k$ is a ground truth vector of labels for $i$th sample, and $Z_i = h(\mathbf{x}_i) = \{0, 1\}^k$ is a predicted set of labels as $h$ denotes a multi-label classifier. Perhaps, you've mistakenly mixed up $Y_i$ and $Z_i$ meanings. – constt Sep 07 '17 at 02:41
for the `accuracy` measure, how do you elegantly handle cases where the denominator `|Y + Z| == 0`? – ihadanny Jul 03 '18 at 05:57
4

@tomasyany is referring to the text definitions (not the formulas), which do appear to be switched around. – Narfanar Dec 28 '18 at 09:07
And this AP definition looks more like mAP (mean AP), no? What's referred to as 'Accuracy' is the average IoU. The terms are quite a bit confusing overall. – Narfanar Dec 28 '18 at 09:58
@Mussri For Average Precision vs. mean Average Precision: [Average Precision in Object Detection](https://stats.stackexchange.com/a/352798/12359) – Franck Dernoncourt Dec 28 '18 at 10:01
@FranckDernoncourt , That does support that the definition in the survey is actually for mAP. AP would be the individual sample score, and we get the mean of that. – Narfanar Dec 28 '18 at 12:05
@ihadanny , I(|Z| == 0) – Narfanar Dec 28 '18 at 12:06
Text descriptions of recall and precision are indeed wrong (although formulas seems ok). It makes me doubt about quality of this source... – hans Sep 10 '19 at 14:24

score 9 · Answer 2 · answered Dec 08 '11 at 11:25

9

The Hamming Loss is probably the most widely used loss function in multi-label classification.

Have a look at Empirical Studies on Multi-label Classification and Multi-Label Classification: An Overview, both of which discuss this.

answered Dec 08 '11 at 11:25

tdc

7,289
5
32
62

score 4 · Answer 3 · answered Dec 03 '13 at 19:43

Correctly Predicted is the intersection between the set of suggested labels and the set expected one. Total Instances is the union of the sets above (no duplicate count).

So given a single example where you predict classes A, G, E and the test case has E, A, H, P as the correct ones you end up with Accuracy = Intersection{(A,G,E), (E,A,H,P)} / Union{(A,G,E), (E,A,H,P)} = 2 / 5

What are the measure for accuracy of multilabel data?

3 Answers3

Linked