Handling features which have the default value in most instances

Question

I am using a Generalized Additive Model to predict a score between 0-100.

One of the features in the model is a boolean value which is rarely true. When the value is true, it is a very strong signal that the score should be low. When it is false, the score is not affected by it.

Is there any standard way to incorporate this into the model (specifically the fact that this feature is very important when true but useless when false)? Or is it recommended to just add it as a rule external to the model?

score 0 · Accepted Answer · answered Dec 15 '21 at 17:03

Maybe it is best to make two separate models, one for the rare case and one for the common case. In particular, if the special feature in some sense overrides the other features, mixing in the rate cases to the training set will make the model worse.

The only reason not to do this is if a priori you believe that the other features have the same consequences whether or not the special feature is true.

Handling features which have the default value in most instances

1 Answers1