I have a text sentiment classification model trained using linear SVM on 2500 training instances with around 14000 features(word), every sample is represented as binary vector with 1 indicate presence of a word and 0 indicate the absence of the word in the particular sample.
I was just wondering if we can infer if the linearSVM model is overfitting when there are large number of weighted features that are only used by small number of training instances? For example, if 80% of positive feature are only used by one training instance, can we conclude the model is overfitting those features?