0

Say I have the following in R

rm(list=ls())
set.seed(1000)
n<-20
x<-rnorm(n, 0.5,1)
y<-rnorm(n, 0.5,1)
type<-rep(2,n)
df1<-data.frame(x,y,type)
x<-rnorm(n, -0.5,1)
y<-rnorm(n, -0.5,1)
type<-rep(5,n)
df0<-data.frame(x,y,type)
df<-merge(x=df0,y=df1,all=T)
plot(df$x,df$y,col=df$type)

enter image description here

Now as you can see I have classes that overlap -- say the cost of classifying the red class incorrectly is 10 times the cost of classifying the blue class incorrectly.

Say I want to use a logistic regression -- how would I do incorporate the cost into my logistic regression model?

Sycorax
  • 76,417
  • 20
  • 189
  • 313
user1172468
  • 1,505
  • 5
  • 21
  • 36
  • 1
    Why would you incorporate the cost in the model at all? Doesn't cost relate to the question of *sample design* when you are prospectively contemplating obtaining data? If not, then could you elaborate on what these costs actually reflect? – whuber May 20 '16 at 20:11
  • 1
    With a logistic regression classifier, usually one chooses a threshold to dichotomize predictions into positive and negative. Are you asking for a way to incorporate this cost into finding an optimal threshold or are you hoping to obtain another binary classifier altogether? – AdamO May 20 '16 at 20:18
  • @whuber, I guess that was my question -- is the cost function solely employed on the probabilities the model spits out when scoring? – user1172468 May 21 '16 at 23:21

0 Answers0