0

Suppose I have the following code of an NLTK Naive Bayes Classifier.

It is a toy example of a sentiment analysis implementation.

import nltk
from nltk import NaiveBayesClassifier as nbc
from nltk.tokenize import word_tokenize
from itertools import chain

training_data = [('I love this sandwich.', 'pos'),
('This is an amazing place!', 'pos'),
('I feel very good about these beers.', 'pos'),
('This is my best work.', 'pos'),
("What an awesome view", 'pos'),
('I do not like this restaurant', 'neg'),
('I am tired of this stuff.', 'neg'),
("I can't deal with this", 'neg'),
('He is my sworn enemy!', 'neg'),
('My boss is horrible.', 'neg')]

vocabulary = set(chain(*[word_tokenize(i[0].lower()) for i in training_data]))

feature_set = [({i:(i in word_tokenize(sentence.lower())) for i in vocabulary},tag) for sentence, tag in training_data]

classifier = nbc.train(feature_set)

test_sentence = "This is the best band I've ever heard!"
featurized_test_sentence =  {i:(i in word_tokenize(test_sentence.lower())) for i in vocabulary}

test_sentence1 = "Sun rises in the east"
featurized_test_sentence1 =  {i:(i in word_tokenize(test_sentence.lower())) for i in vocabulary}

tag=classifier.classify(featurized_test_sentence)
print "TP:",tag

tag1=classifier.classify(featurized_test_sentence1)
print "FP:",tag1

Now the first test sentence is giving us the tag value “pos”, which is a TP. But the second test example is giving us the tag value “pos”, which is FP.

My objective is if I have a very unknown application sentence which may not be anywhere near in the training set which may be FP, how I may detect it automatically.

Confusion matrix, show_most_informative_features(), prob_classify() is not helping me.

HIGGINS
  • 479
  • 8
  • 12
  • Please give us more details. What exactly do you want to identify? What is your data? – Tim Jun 19 '18 at 06:20
  • Thank you for your prompt comment. My problem is suppose I implement one multi-class classifier like Naive Bayes, for sentiment analysis, along with TP many application sentences may be FP. If the input is single sentence we may not be able to generate Confusion Matrix. In that case how may we detect FP automatically? – HIGGINS Jun 19 '18 at 06:27
  • Sorry but this is still unclear. Could you please edit your question to give us more details and possibly an example, so that we understand your problem better? – Tim Jun 19 '18 at 06:32
  • Sure, Sir. I am editing the question. – HIGGINS Jun 19 '18 at 06:33

0 Answers0