Rule of thumb for removing / keeping attribute based on occurrence frequencies among training observations?

Asked Oct 20 '21 at 17:54

Active Oct 21 '21 at 20:03

Viewed 25 times

I have a training dataset expressed with binary values, where 1 indicates an attribute is used in an observation, and 0 indicates it is not.

I was wondering if I should remove an attribute from the training vector if it is used by only small number of observations (e.g. 1 or 2, assuming we have 2500 observations for training in total)?

Are there any general rule of thumb on how to remove attribute based on occurrence frequencies among training observations?

edited Oct 21 '21 at 20:03

asked Oct 20 '21 at 17:54

GabiX

Could you explain the sense of "used by"? After all, for a training set to work, it mustn't have any missing values at all (unless the model specifically imputes those values). – whuber Oct 21 '21 at 19:44
Hi, I’m sorry for the confusion, I added more detail in the question, does the question make more sense now? – GabiX Oct 21 '21 at 20:04

Rule of thumb for removing / keeping attribute based on occurrence frequencies among training observations?

0 Answers0