Cost function in cv. glm for a fitted logistic model when cutoff value of the model is not 0.5

Question

I have a logistic model fitted with the following R function:

glmfit<-glm(formula, data, family=binomial)

A reasonable cutoff value in order to get a good data classification (or confusion matrix) with the fitted model is 0.2 instead of the mostly used 0.5.

And I want to use the cv.glm function with the fitted model:

cv.glm(data, glmfit, cost, K)

Since the response in the fitted model is a binary variable an appropriate cost function is (obtained from "Examples" section of ?cv.glm):

cost <- function(r, pi = 0) mean(abs(r-pi) > 0.5)

As I have a cutoff value of 0.2, can I apply this standard cost function or should I define a different one and how?

Thank you very much in advance.

"A reasonable cutoff value in order to get a good data classification (or confusion matrix) with the fitted model is 0.2 instead of the mostly used 0.5." Just curious, but how do you know that 0.2 is a better cutoff than 0.5? — coip, Dec 06 '17 at 23:17
I very much recommend our earlier thread [Classification probability threshold](https://stats.stackexchange.com/q/312119/1352). — Stephan Kolassa, Sep 20 '19 at 11:49

score 1 · Answer 1 · edited Sep 20 '19 at 11:38

1

You can simply do:

cost <- function(r, pi = 0) mean(abs(r-pi) > 0.2)

The logic follows:

If your cutoff is 0.2, then predict an outcome of 1 if pi is greater than 0.2.
Therefore, the number of times you are wrong is given by summing the logical vector
```
abs(r-pi) > 0.2
```
We can arrive at this by looking at both cases where the prediction is wrong:
```
if r = 0 and pi > 0.2
if r = 1 and pi <= 0.2
```
In both cases, abs(r - pi) > 0.2 will return the value TRUE, meaning that the prediction is wrong.

edited Sep 20 '19 at 11:38

gung - Reinstate Monica

132,789
81
357
650

answered Apr 07 '15 at 10:28

Alex

3,728
3
25
46

The cutoff comes from the cost function, not vice versa. And the only way a cutoff exists is for the cost function to be identical across all units. – Frank Harrell Sep 20 '19 at 11:43

score 1 · Answer 2 · edited Apr 13 '17 at 12:44

OK, No answers to my post. But I think I got the answer. All credits go to @Feng Mai. He wrote a post here: What is the cost function in cv.glm in R's boot package? and thanks to it here is my answer to my question:

For a cutoff value of 0.2, I think that I could I apply the following cost function:

 mycost <- function(r, pi){
 weight1 = 1 #cost for getting 1 wrong
 weight0 = 1 #cost for getting 0 wrong
 c1 = (r==1)&(pi<0.2) #logical vector - true if actual 1 but predict 0
 c0 = (r==0)&(pi>0.2) #logical vecotr - true if actual 0 but predict 1
 return(mean(weight1*c1+weight0*c0))
 }

And then I would use the cv.glm function with the fitted model and mycost function:

cv.glm(data, glmfit, cost=mycost, K)

Hopefully this might work. Am I right?

I think that it is not proper to do this unless the cost function has been specified from subject matter experts. It is not a statistical quantity, and often varies with subjects. — Frank Harrell, Jan 30 '14 at 13:19

Cost function in cv. glm for a fitted logistic model when cutoff value of the model is not 0.5

2 Answers2