0

I have 4 features in my dataset. let's say one feature is currently used by doctors in hospital (along with their domain expertise) to make informed decisions.

So if I build a binary classification model with just that one feature, it gives me a f1-score of 50%

But I as a non-clinician but just a data guy does some exploration of data and do some feature engineering and come up with 3 new features. So, now when I use this 3 new features along with one old feature, my model gives me a f1-score of 53%

So, should I consider this 3 new features adds value to my model and helps improve prediction accuracy or how can I justify that these 3 new features are useful?

As a layman, I see that f1-score increased from 50% to 53% under same settings like algorithm ,hyperparameters etc. So, am I right to understand that it is good to add these values?

my questions are

a) Should I add these new features or not? How to determine they are really adding value and not by chance? (I ran multiple time and everytime, the addition of 3 new features improves the f1-score by 3 points approximately)

b) or should I drop the whole project because f1-score of 50% is already bad and good for nothing

The Great
  • 1,380
  • 6
  • 18
  • 4
    $53$ is more than $50$, so I would be inclined to say that your model is an improvement, assuming you are getting that $F_1$ score on out-of-sample data. // Watch out for threshold-based metrics like accuracy, sensitivity, specificity, and $F_1$, though. Frank Harrell describes them as being optimized by a “bogus” model. Please refer to his blog: https://www.fharrell.com/post/class-damage/ https://www.fharrell.com/post/classification/ – Dave Jun 02 '21 at 09:28
  • 1
    Also relevant: https://stats.stackexchange.com/questions/414349/is-my-model-any-good-based-on-the-diagnostic-metric-r2-auc-accuracy-rmse – mkt Jun 02 '21 at 09:30

0 Answers0