Questions tagged [nltk]

NLTK stands for Natural Language ToolKit, a Python-based platform for working with human language data.

NLTK stands for Natural Language ToolKit, a Python-based platform for working with human language data.

19 questions
19
votes
2 answers

How is the .similarity method in SpaCy computed?

Not Sure if this is the right stack site, but here goes. How does the .similiarity method work? Wow spaCy is great! Its tfidf model could be easier, but w2v with only one line of code?! In his 10 line tutorial on spaCy andrazhribernik show's us the…
whs2k
  • 451
  • 1
  • 3
  • 10
6
votes
1 answer

Difference between Log Entropy Model and TF-IDF Model?

I would like to understand what are the differences/advantages in using TF-IDF or the Log Entropy model for represeting documents and queries in an information retrieval system using diferent weights. I've tested both of them and computed the recall…
3
votes
0 answers

Cross Co-occurrence between two corpora

I've looked around for a solution to this problem specifically in nltk, quite a bit but couldn't find much help either on SO or elsewhere. My problem is as follows: I have a set of aligned pairs of sentences: [(p1, q1), (p2,…
user1669710
  • 529
  • 3
  • 8
2
votes
2 answers

Tagging of tweets using NLTK

Is there a method to perform tagging of tweets using NLTK? The pos_tag() function gives incorrect results on twitter data (which uses textese): # checking if NLTK tokenizers work on SMS textese tokens = pos_tag(word_tokenize("ikr smh he asked fir yo…
euler16
  • 109
  • 1
  • 7
2
votes
2 answers

Is there any package in R/Python which can analyze Pos./Neg sentiment of whole review?

I'm a newbie here in the forum and new to text analytics using Python and R. My question is somewhat similar to Is there a better approach than counting positive-negative words in sentiment analysis? I'm working on a dataframe with 2000 rows of…
2
votes
1 answer

How may I convert Perplexity to F Measure

In the practice of Machine Learning accuracy of some models are determined by perplexity, (like LDA), while many of them (Naive Bayes, HMM,etc..) by F Measure. I like to evaluate all the models with some common standards. I am looking to convert…
HIGGINS
  • 479
  • 8
  • 12
1
vote
1 answer

nltk multi_kappa (Davies and Fleiss) or alpha (Krippendorff)?

I'm using inter-rater agreement to evaluate the agreement in my rating dataset. I have a set of N examples distributed among M raters. Not all raters voted every item, so I have N x M votes as the upper bound. So let's say the rater i gives the…
loretoparisi
  • 153
  • 8
1
vote
2 answers

Why does my sentiment analyser only output 1 label?

I am trying to do sentiment analysis on a corpus of product reviews. My corpus contains 50,000 samples, of which I take 70% for training and 30% for testing. I discretized the 5-star rating to 3 categories as follows: [0, 2] = negative [3] =…
JNevens
  • 269
  • 1
  • 3
  • 15
1
vote
2 answers

NLTK: odd outputs from bleu_score

For machine translation purposes I use bleu score, which seems to be the validation mechanism of choice (used in the sutskever 2014 sequence-to-sequence). The purpose is to get as high bleu as possible (between 0 to 1). The following mumble gives an…
0
votes
1 answer

When to use documents vs. sentences for Word2Vec?

I have a collection of words from different communities. Each community has a different way of using language and will provide a different word embedding. I can concatenate the sentences from the different communities to produce one corpus, but I…
VminVsky
  • 3
  • 1
0
votes
0 answers

NLTK BigramAssocMeasures.pmi is give same score for all bigrams

I am trying to use BigramAssocMeasures PMI to find the most import bigrams however it's giving all Bigrams the same score, so I end up with a list in alphabetical order when I use .nbest. Where as when I just bigram_measures.likelihood_ratio the…
0
votes
1 answer

NLP various probabilities estimators in nltk

I saw there are many types of probabilities in nltk: MLE, ELE, Laplace, Heldout, KnereserNey, Lidstone, Random, WittenBel.. What is the exact difference between them and when should I use each? My goal is to get the entropy of a specific sentence…
okuoub
  • 27
  • 8
0
votes
0 answers

Naive Bayes Assignment of Feature Probability

I'm using the .show_most_informative_features() function from NLTK's Naive Bayes to generate features to be used with a lexicon. In the case of my binary-classification problem, these features are calculated as (where W = feature and V =…
Laurie
  • 111
  • 2
0
votes
0 answers

Handling False Positive in a Classifier

Suppose I have the following code of an NLTK Naive Bayes Classifier. It is a toy example of a sentiment analysis implementation. import nltk from nltk import NaiveBayesClassifier as nbc from nltk.tokenize import word_tokenize from itertools import…
0
votes
1 answer

How to classify text when having very little training data

I have a dataframe as follows: New_Text | New_Score review1 | Positive review2 | Negative review4 | Positive ... and so on. I want to create a model that tells whether a review is Positive or Negative I have been asked to use only 30% of…
Rajiv
1
2