0

I have a imbalanced dataset of 3k rows with 87:13 ratio of positive and negative classes. I am trying to do binary classification. Since my class proportion is skewed, I have optimize the decision threshold.

a) I have 3 independent features which are feat_a, feat_b and feat_c. While feat_a is studied heavily in literature and used in practice, we would like to see whether feat_b and feat_c add any value to our prediction model. So, I built a model (model a) with feat_a using logistic regression. Now my objective is to build two more models as shown below

model b ~ feat_a, feat_b (will have two features)

model c ~ feat_a, feat_b, feat_c (will have three features)

Now my question is

a) How can I compare model a, model b, model c?

b) Since my dataset is imbalanced, I have to choose appropriate hyperparameters. Should this hyperparameters be same across all models because I want to compare them?

c) Should the decision threshold be same across all models? I know by default 0.5 is the threshold but since my dataset is imbalanced, it is important to optimize the threshold. Should I retain the same decision threshold across different models?

Can help me with this?

The Great
  • 1,380
  • 6
  • 18

1 Answers1

-1

Use logistic regression. That gives you an estimated probability for membership, not a hard classification. Then, if you need a crisp decision at a later point, use a loss function based on real (say, economic) losses from wrong decisions.

There are many similar posts, for instance

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467