No Training Dataset for Sentiment Analysis Algo

Question

I am learning about the potential with Sentiment Analysis and have gone through many examples but I am still unsure about my understanding of one crucial detail: does one always need to start Sentiment Analysis with a previously labeled training set?

Let's say I am a small business and just implemented an online feedback form and my customers have started sending me comments about my product. In this case I would have no training set. Would I have to manually label each comment with a subjective score e.g. 1 to 5? Wouldn't this process heavily rely on subjective opinions of whoever ends up labeling the comments e.g. who is to say whether if a comment such as "I kind of liked the product but I am not sure if I will be buying again" deserves a 2 or a 3 as a sentiment score?

score 2 · Accepted Answer · answered Aug 31 '19 at 06:09

"does one always need to start Sentiment Analysis with a previously labeled training set?"

You can use a pre-trained model and test how it performs in your domain (see for instance blog)

You can also use a more sophisticated transfer learning approach, where you take a trained model and fine-tune it with a small amount of labelled data you provide.
blog

"In this case I would have no training set. Would I have to manually label each comment with a subjective score e.g. 1 to 5?"

In general yes, and is a major bottleneck to supervised machine learning to generate labelled datasets. However, you could also use a programmatic approach based on heuristic rules and predictions from some pre-trained model (see for instance snorkel)

"Wouldn't this process heavily rely on subjective opinions of whoever ends up labeling the comments e.g. who is to say whether if a comment such as "I kind of liked the product but I am not sure if I will be buying again" deserves a 2 or a 3 as a sentiment score?"

Yes, the labelling process would be somewhat subjective but this might not be relevant for your application. You can also simplify the labelling process by choosing discrete labels (eg. bad, neutral, good). More formally, one can also introduce this ambiguity or noise into the labelling process and during the classification (see again snorkel for example)

score 0 · Answer 2 · edited Jun 11 '20 at 14:32

0

You could use a dictionary as a first step. See NLTK:

Harvard General Inquirer

URL: http://www.wjh.harvard.edu/~inquirer/

PAPERS: The General Inquirer: A Computer Approach to Content Analysis (Stone, Philip J; Dexter C. Dunphry; Marshall S. Smith; and Daniel M. Ogilvie. 1966)

There are also out of the box approaches. Like vaderSentiment.

edited Jun 11 '20 at 14:32

Community

1

answered Aug 31 '19 at 07:20

user2672299

348
1
7

No Training Dataset for Sentiment Analysis Algo

2 Answers2