1

I am evaluating ML models (GBDTs) on various test sets using Precision-recall curve, and my goal is: within some precision range, get as high recall as possible.

The precision-recall curves on most of my test sets have a shape that is like "a line from top-left to bottom-right"(see curve1 below). However, on some test sets, they have a very flat region (see curve2 below).

What does that imply? I understand this means the precision has not changed much for a large range of recall, but I want deeper understanding. For example, does the flat region the imply something about the predicted scores of the model, or something unusual about the test set?

Thanks!

curve1:

PR CURVE 1

curve2, which has a very flat region:

Precision recall curve

Zero Liu
  • 63
  • 4

1 Answers1

2

Flat regions in a PR curve are generally speaking "good". They effectively imply that we are able to increase Recall (i.e. recognise additional True Positives instances) without inflating the number of wrongly classified true negative results (i.e. avoiding more False Positive instances).

When seeing a straight line in a PR curve, it is at times associated with clusters in the underlying data. For example, in curve 1 it is likely that about 20% of the test set has a very informative characteristic that allows us to easily distinguish it as being positive. It might suggest something like "complete separation"; see for example the CV.SE thread on: How to deal with perfect separation in logistic regression? for more details on that notion - this is not necessarily destructive for our algorithm. Similarly, this flatness might be exaggerated in cases of a severely imbalanced dataset. In such cases, our algorithm might be not that great but just because of random sampling variation we hit a small cluster of "easy instances" and we get such a plateau (in fairness this usually gives sawtooth patterns - I discuss this a bit in the thread: Starting point of the PR-curve and the AUCPR value for an ideal classifier).

In any case, I would recommend using a cross-validation schema to perform model selection as well as plotting the baseline of the corresponding PR curve (see the CV.SE thread on: What is "baseline" in precision recall curve) to have a more realistic view of a classifier's performance.

usεr11852
  • 33,608
  • 2
  • 75
  • 117