In R how to you modify a logistic regression where the cost of selecting one of the classes is much higher than the other.

Asked May 20 '16 at 19:56

Active Sep 12 '20 at 16:28

Viewed 53 times

Say I have the following in R

rm(list=ls())
set.seed(1000)
n<-20
x<-rnorm(n, 0.5,1)
y<-rnorm(n, 0.5,1)
type<-rep(2,n)
df1<-data.frame(x,y,type)
x<-rnorm(n, -0.5,1)
y<-rnorm(n, -0.5,1)
type<-rep(5,n)
df0<-data.frame(x,y,type)
df<-merge(x=df0,y=df1,all=T)
plot(df$x,df$y,col=df$type)

Now as you can see I have classes that overlap -- say the cost of classifying the red class incorrectly is 10 times the cost of classifying the blue class incorrectly.

Say I want to use a logistic regression -- how would I do incorporate the cost into my logistic regression model?

edited Sep 12 '20 at 16:28

Sycorax

76,417
20
189
313

asked May 20 '16 at 19:56

user1172468

1,505
5
21
36

1

Why would you incorporate the cost in the model at all? Doesn't cost relate to the question of *sample design* when you are prospectively contemplating obtaining data? If not, then could you elaborate on what these costs actually reflect? – whuber May 20 '16 at 20:11
1

With a logistic regression classifier, usually one chooses a threshold to dichotomize predictions into positive and negative. Are you asking for a way to incorporate this cost into finding an optimal threshold or are you hoping to obtain another binary classifier altogether? – AdamO May 20 '16 at 20:18
@whuber, I guess that was my question -- is the cost function solely employed on the probabilities the model spits out when scoring? – user1172468 May 21 '16 at 23:21

In R how to you modify a logistic regression where the cost of selecting one of the classes is much higher than the other.

0 Answers0