0

I have a dataframe as follows:

New_Text  | New_Score
review1   | Positive
review2   | Negative
review4   | Positive

... and so on.

I want to create a model that tells whether a review is Positive or Negative I have been asked to use only 30% of the data as training data and the rest as test data.

Now, I can't use a simple Naive Bayes Classifier or Support Vector Machine because the training data is very little and the test data is very high? How to do text classification in such a case?

  • 3
    FWIW it's not the % split but the absolute amount of training data that will be your limiting factor. – C8H10N4O2 Aug 23 '17 at 20:29

1 Answers1

0

Can you try using doc2vec or any other word-embedding based technique?

Also, Jurafsky and Martin's Speech and Language Processing, 3d ed, chapter 18 seems like a good start, it mentions using information from WordNet which can be used for example from nltk.

Jakub Bartczuk
  • 5,526
  • 1
  • 14
  • 36