4

For a logistic regression classifier, I create a roc curve by variation of the threshold on the output probability.

Question: can I create an additional ROC curve with 5% rejection rate based on the classification probability, by rejection of the samples closest to the threshold? This means that every point on the ROC will be based on a different not rejected samples. If yes, where I can find a reference paper about it? If no, what is the proper procedure, and where I can read about it?

Lately, someone suggested that instead of rejecting by 5% of the testing set, I should reject by a threshold which is extracted for 5% of the training set. I am not sure that the difference is important but if it is a standard procedure, I would be happy to find a reference of it.

Gideon Kogan
  • 250
  • 1
  • 10
  • I know it's not exactly what you want, but you can take a look at a paper about [accuracy-rejection curve](http://proceedings.mlr.press/v8/nadeem10a/nadeem10a.pdf) – Kirill Fedyanin May 01 '20 at 19:47
  • I am familiar with it, thanks – Gideon Kogan May 02 '20 at 07:52
  • 1
    [This paper](https://dl.acm.org/doi/10.1016/j.patrec.2004.09.004) might be of interest, they use a three way split (training, validation, test) for training the model, obtaining thresholds and evaluating the models, respectively. – A Person May 07 '20 at 14:33
  • @APerson, thanks for the paper. It is related very closely to the topic and in fact, what it suggests is similar to what I have proposed. Nevertheless, I think that the writer did not do a very good job. Nevertheless, probably, it is as far as the literature goes... – Gideon Kogan May 07 '20 at 18:16
  • What are you trying to accomplish with this re-drawn ROC curve? By my reading, the paper linked by @APerson uses the standard ROC curve to "reject" cases near the cutoff for further testing, based on relative costs of true and false classifications and the cost of the further testing. That doesn't seem to be exactly what you had in mind. In general, one worries about losing information by omitting data from the model, which is what would happen with a re-drawn ROC curve. What is the corresponding advantage that you expect to gain by this procedure? – EdM May 09 '20 at 19:16
  • what do mean exactly by the verb "reject"? in what context? – carlo May 09 '20 at 20:00
  • @EdM, I am trying to accomplish creation of a ROC curve with improved performance (say AUC) by the implementation of samples rejection. The link provided by APerson is the closest thing I have found, also not exactly what I looked for. I am not worried about losing information, as the rejection is done after the training and is mainly directed to the testing samples. The one thing that bothered me is creation of ROC with rejection but it seems that it has some reference in the literature. The other thing is the variation of the rejected samples from one point on the ROC curve to another... – Gideon Kogan May 10 '20 at 04:57
  • @carlo, in the context of classification. Some of the samples will be classified and some will be rejected – Gideon Kogan May 10 '20 at 04:59

1 Answers1

1

I have found partial answer to my questions in Tortorella, Francesco. "A ROC-based reject rule for dichotomizers." Pattern Recognition Letters 26.2 (2005): 167-180. (suggested by @Aperson)

Here, for every point in the ROC curve, we define two thresholds, $t_1$, $t_2$ around the original threshold $t_{opt}$ and reject all the samples within the created region. This way allows improving the performance in all the point of the ROC curve.

enter image description here

Only question left is how to define those $t_1=f(t_{opt})$, $t_2=f(t_{opt})$? It seems that there are many different ways to define them but all the ways should be defined based on the training set and implemented on the testing set.

Gideon Kogan
  • 250
  • 1
  • 10
  • The paper shows how to choose $t_1$ and $t_2$ based on the relative benefits/costs of true/false positives/negatives, corrected for the costs of rejecting true negatives or positives. Relative costs should always be considered when choosing cutoffs; the usual default classification cutoff at p=0.5 implicitly assumes equal costs of either misclassification type. You might also consider different scoring rules or targeted maximum likelihood as ways to improve performance near a cutoff; see [this page](https://stats.stackexchange.com/q/440764/28500) and links. – EdM May 10 '20 at 18:19
  • Table 1 of the cited paper seems to have a misprint, with costs of rejecting positives/negatives, CRP and CRN, interchanged between the correct rows. The formulas in the text seem OK at first glance. – EdM May 10 '20 at 22:01
  • @EdM, I was only looking at the method, did not check the numbers/results. – Gideon Kogan May 11 '20 at 05:03