Semantic Analysis: Set a default value for examples not in scope of the training set

Asked Mar 26 '18 at 23:52

Active Mar 27 '18 at 05:13

Viewed 18 times

I am working with a semantic analysis problem and wanted to know if anyone has been able to set a default value, say a probability of zero or 0.5 for phrases/words that the machine learning algorithm has never seen. Using scikit-learn's classifiers and nltk's word_vectorizer I have experienced probability predictions of 1.0 for words and phrases not in the training, which is a potentially misleading output with absolute confidence.

Would adding a dictionary of English not in the training with a target of zero help?

What about non-words or incorrect spellings? How do you punish the unknown/unseen without explicitly punishing all permutations of words or word-spellings not in the corpus?

edited Mar 27 '18 at 05:13

asked Mar 26 '18 at 23:52

MyopicVisage

Semantic Analysis: Set a default value for examples not in scope of the training set

0 Answers0