How to classify text when having very little training data

Question

I have a dataframe as follows:

New_Text  | New_Score
review1   | Positive
review2   | Negative
review4   | Positive

... and so on.

I want to create a model that tells whether a review is Positive or Negative I have been asked to use only 30% of the data as training data and the rest as test data.

Now, I can't use a simple Naive Bayes Classifier or Support Vector Machine because the training data is very little and the test data is very high? How to do text classification in such a case?

FWIW it's not the % split but the absolute amount of training data that will be your limiting factor. — C8H10N4O2, Aug 23 '17 at 20:29

score 0 · Answer 1 · answered Aug 25 '17 at 08:46

Can you try using doc2vec or any other word-embedding based technique?

Also, Jurafsky and Martin's Speech and Language Processing, 3d ed, chapter 18 seems like a good start, it mentions using information from WordNet which can be used for example from nltk.

How to classify text when having very little training data

1 Answers1