2

I have a logistic model fitted with the following R function:

glmfit<-glm(formula, data, family=binomial)

A reasonable cutoff value in order to get a good data classification (or confusion matrix) with the fitted model is 0.2 instead of the mostly used 0.5.

And I want to use the cv.glm function with the fitted model:

cv.glm(data, glmfit, cost, K)

Since the response in the fitted model is a binary variable an appropriate cost function is (obtained from "Examples" section of ?cv.glm):

cost <- function(r, pi = 0) mean(abs(r-pi) > 0.5)

As I have a cutoff value of 0.2, can I apply this standard cost function or should I define a different one and how?

Thank you very much in advance.

Sycorax
  • 76,417
  • 20
  • 189
  • 313
perevales
  • 51
  • 1
  • 6
  • "A reasonable cutoff value in order to get a good data classification (or confusion matrix) with the fitted model is 0.2 instead of the mostly used 0.5." Just curious, but how do you know that 0.2 is a better cutoff than 0.5? – coip Dec 06 '17 at 23:17
  • I very much recommend our earlier thread [Classification probability threshold](https://stats.stackexchange.com/q/312119/1352). – Stephan Kolassa Sep 20 '19 at 11:49

2 Answers2

1

You can simply do:

cost <- function(r, pi = 0) mean(abs(r-pi) > 0.2)

The logic follows:

  1. If your cutoff is 0.2, then predict an outcome of 1 if pi is greater than 0.2.
  2. Therefore, the number of times you are wrong is given by summing the logical vector

    abs(r-pi) > 0.2
    

    We can arrive at this by looking at both cases where the prediction is wrong:

    if r = 0 and pi > 0.2
    if r = 1 and pi <= 0.2
    

    In both cases, abs(r - pi) > 0.2 will return the value TRUE, meaning that the prediction is wrong.

gung - Reinstate Monica
  • 132,789
  • 81
  • 357
  • 650
Alex
  • 3,728
  • 3
  • 25
  • 46
  • The cutoff comes from the cost function, not vice versa. And the only way a cutoff exists is for the cost function to be identical across all units. – Frank Harrell Sep 20 '19 at 11:43
1

OK, No answers to my post. But I think I got the answer. All credits go to @Feng Mai. He wrote a post here: What is the cost function in cv.glm in R's boot package? and thanks to it here is my answer to my question:

For a cutoff value of 0.2, I think that I could I apply the following cost function:

 mycost <- function(r, pi){
 weight1 = 1 #cost for getting 1 wrong
 weight0 = 1 #cost for getting 0 wrong
 c1 = (r==1)&(pi<0.2) #logical vector - true if actual 1 but predict 0
 c0 = (r==0)&(pi>0.2) #logical vecotr - true if actual 0 but predict 1
 return(mean(weight1*c1+weight0*c0))
 }

And then I would use the cv.glm function with the fitted model and mycost function:

cv.glm(data, glmfit, cost=mycost, K)

Hopefully this might work. Am I right?

perevales
  • 51
  • 1
  • 6
  • 2
    I think that it is not proper to do this unless the cost function has been specified from subject matter experts. It is not a statistical quantity, and often varies with subjects. – Frank Harrell Jan 30 '14 at 13:19