0

I have a training dataset expressed with binary values, where 1 indicates an attribute is used in an observation, and 0 indicates it is not.

I was wondering if I should remove an attribute from the training vector if it is used by only small number of observations (e.g. 1 or 2, assuming we have 2500 observations for training in total)?

Are there any general rule of thumb on how to remove attribute based on occurrence frequencies among training observations?

GabiX
  • 23
  • 4
  • Could you explain the sense of "used by"? After all, for a training set to work, it mustn't have any missing values at all (unless the model specifically imputes those values). – whuber Oct 21 '21 at 19:44
  • Hi, I’m sorry for the confusion, I added more detail in the question, does the question make more sense now? – GabiX Oct 21 '21 at 20:04

0 Answers0