I have a training dataset expressed with binary values, where 1 indicates an attribute is used in an observation, and 0 indicates it is not.
I was wondering if I should remove an attribute from the training vector if it is used by only small number of observations (e.g. 1 or 2, assuming we have 2500 observations for training in total)?
Are there any general rule of thumb on how to remove attribute based on occurrence frequencies among training observations?