I have a imbalanced dataset of 3k rows with 87:13 ratio of positive and negative classes. I am trying to do binary classification. Since my class proportion is skewed, I have optimize the decision threshold.
a) I have 3 independent features which are feat_a
, feat_b
and feat_c
. While feat_a
is studied heavily in literature and used in practice, we would like to see whether feat_b
and feat_c
add any value to our prediction model. So, I built a model (model a
) with feat_a
using logistic regression. Now my objective is to build two more models as shown below
model b ~ feat_a
, feat_b
(will have two features)
model c ~ feat_a
, feat_b
, feat_c
(will have three features)
Now my question is
a) How can I compare model a
, model b
, model c
?
b) Since my dataset is imbalanced, I have to choose appropriate hyperparameters. Should this hyperparameters be same across all models because I want to compare them?
c) Should the decision threshold be same across all models? I know by default 0.5 is the threshold but since my dataset is imbalanced, it is important to optimize the threshold. Should I retain the same decision threshold across different models?
Can help me with this?