In general, what are precision, recall, F1 that are reported in papers?

Question

I used classification_report in sklearn library

And, the picture below shows evaluation on my model (anomaly detector)

In general, what are precision, recall, F1 that are reported in papers ?

I think it's reasonable to use precision and recall with macro avg (in my case, 0.5001, 0.7000)

So, when writing a paper, can I report these values?

Otherwise, what are precision, recall, F1 that are reported in papers ?

mhdadk · Answer 1 · 2021-07-29T16:17:55.247

2

Good precision, recall, and F1 score values will highly depend on your application. For example, if you are trying to detect cancer cells in patients, you may be more interested in the number of false negatives than the number of false positives. A false negative could lead to the death of a patient, while the worst that can happen in the case of a false positive is an upset patient.

Since you care more about minimizing false negatives than minimizing false positives, you would pay more attention to recall than precision. In other applications, false positives may be more important than false negatives. Therefore, you should check for typical values in papers in your application area.

Since you are evaluating the performance of an anomaly detector, there exists two classes: anomalous and non-anomalous. This is a binary classification problem, so I am not sure how macro and weighted averages are relevant here.

In case I am wrong and this is a multi-class classification problem, deciding whether to use the macro average or weighted average will again depend on what is normally done in your application area. This answer provides an informative overview.

edited Jul 29 '21 at 16:17

answered Jul 29 '21 at 12:59

mhdadk

2,582
1
4
17

1

+1 You can take the critical importance of relative false-positive and false-negative costs even farther, to argue that precision, recall, F1 scores, etc., are the _last_ things one should be concerned with. Having a well-calibrated probability model should come first, followed (if actually necessary) by choice of a probability cutoff that minimizes overall cost. I suspect that the OP used a default of a 50% probability cutoff, unconsciously giving equal weight to false positives and false negatives. See [this page](https://stats.stackexchange.com/q/312780/28500) for example. – EdM Jul 29 '21 at 15:31
2

I think the OP is more concerned with whether papers typically report "macro avg" or "weighted avg" precision, recall, F1 - not with what "good values" of all these KPIs are. – Stephan Kolassa Jul 29 '21 at 15:58
@StephanKolassa Fair enough. This wasn't initially clear to me. I have edited my answer to try to address this. – mhdadk Jul 29 '21 at 16:08
@mhdadk Thk y for your explanation So, is it also reasonable to compute and report average precision/recall on confusion matrix for anomaly class and confusion matrix for normal class? – Dae-Young Park Jul 30 '21 at 05:21
@StephanKolassa Um.. so I think I can report precision/recall with macro avg, right? – Dae-Young Park Jul 30 '21 at 05:22
@EdM Sorry, What is OP..? It is difficult to understand your explanation – Dae-Young Park Jul 30 '21 at 05:23
"OP" = "Original Poster", the one who asked the original question, in this case: you. To your question: I personally am not familiar with these two ways of averaging, sorry. I would recommend against using *any* of these, because every criticism against accuracy [on this page](https://stats.stackexchange.com/q/312780/1352) that EdM already linked to applies equally to precision, recall and F1. – Stephan Kolassa Jul 30 '21 at 06:56
@StephanKolassa Do you mean I should report both anomaly pre/re/f1 and normal point pre/re/f1 separately ? – Dae-Young Park Jul 30 '21 at 07:12
By macro and weighted averages, do you mean that you pick different probability thresholds, compute the re/pre/F1 for each threshold, and then average? – mhdadk Jul 30 '21 at 07:17
@mhdadk Yes, maybe (in my case, 0.5001, 0.7000, 0.4739) – Dae-Young Park Jul 30 '21 at 08:07
You may be interested in the [mean average precision](https://jonathan-hui.medium.com/map-mean-average-precision-for-object-detection-45c121a31173), which is a standard evaluation metric in the object detection literature. – mhdadk Jul 30 '21 at 09:27

In general, what are precision, recall, F1 that are reported in papers?

1 Answers1