Highest Voted 'nltk' Questions - Statistical Analysis Stack Exchange

19

votes

2 answers

How is the .similarity method in SpaCy computed?

Not Sure if this is the right stack site, but here goes. How does the .similiarity method work? Wow spaCy is great! Its tfidf model could be easier, but w2v with only one line of code?! In his 10 line tutorial on spaCy andrazhribernik show's us the…

asked Sep 21 '17 at 02:40

whs2k

451
1
3
10

6

votes

1 answer

Difference between Log Entropy Model and TF-IDF Model?

I would like to understand what are the differences/advantages in using TF-IDF or the Log Entropy model for represeting documents and queries in an information retrieval system using diferent weights. I've tested both of them and computed the recall…

natural-language information-retrieval nltk tf-idf

asked May 30 '16 at 17:18

yolanda_dlh

63
1
5

3

votes

0 answers

Cross Co-occurrence between two corpora

I've looked around for a solution to this problem specifically in nltk, quite a bit but couldn't find much help either on SO or elsewhere. My problem is as follows: I have a set of aligned pairs of sentences: [(p1, q1), (p2,…

python text-mining nltk cooccurrence

asked Mar 11 '16 at 18:52

user1669710

529
3
8

2

votes

2 answers

Tagging of tweets using NLTK

Is there a method to perform tagging of tweets using NLTK? The pos_tag() function gives incorrect results on twitter data (which uses textese): # checking if NLTK tokenizers work on SMS textese tokens = pos_tag(word_tokenize("ikr smh he asked fir yo…

python natural-language nltk

asked Dec 13 '16 at 13:07

euler16

109
1
7

2

votes

2 answers

Is there any package in R/Python which can analyze Pos./Neg sentiment of whole review?

I'm a newbie here in the forum and new to text analytics using Python and R. My question is somewhat similar to Is there a better approach than counting positive-negative words in sentiment analysis? I'm working on a dataframe with 2000 rows of…

r python nltk

asked Nov 15 '16 at 07:11

sharathchandramandadi

145
5

2

votes

1 answer

How may I convert Perplexity to F Measure

In the practice of Machine Learning accuracy of some models are determined by perplexity, (like LDA), while many of them (Naive Bayes, HMM,etc..) by F Measure. I like to evaluate all the models with some common standards. I am looking to convert…

machine-learning error natural-language nltk

asked Mar 29 '16 at 18:00

HIGGINS

479
8
12

1

vote

1 answer

nltk multi_kappa (Davies and Fleiss) or alpha (Krippendorff)?

I'm using inter-rater agreement to evaluate the agreement in my rating dataset. I have a set of N examples distributed among M raters. Not all raters voted every item, so I have N x M votes as the upper bound. So let's say the rater i gives the…

python agreement-statistics cohens-kappa nltk

asked Apr 26 '19 at 21:49

loretoparisi

153
8

1

vote

2 answers

Why does my sentiment analyser only output 1 label?

I am trying to do sentiment analysis on a corpus of product reviews. My corpus contains 50,000 samples, of which I take 70% for training and 30% for testing. I discretized the 5-star rating to 3 categories as follows: [0, 2] = negative [3] =…

natural-language nltk

asked Jun 03 '17 at 18:03

JNevens

269
1
3
15

1

vote

2 answers

NLTK: odd outputs from bleu_score

For machine translation purposes I use bleu score, which seems to be the validation mechanism of choice (used in the sutskever 2014 sequence-to-sequence). The purpose is to get as high bleu as possible (between 0 to 1). The following mumble gives an…

machine-learning natural-language nltk machine-translation bleu

asked Mar 21 '16 at 10:16

Alexander R Johansen

125
5

0

votes

1 answer

When to use documents vs. sentences for Word2Vec?

I have a collection of words from different communities. Each community has a different way of using language and will provide a different word embedding. I can concatenate the sentences from the different communities to produce one corpus, but I…

python natural-language word2vec nltk

asked Nov 15 '20 at 17:18

VminVsky

3
1

0

votes

0 answers

NLTK BigramAssocMeasures.pmi is give same score for all bigrams

I am trying to use BigramAssocMeasures PMI to find the most import bigrams however it's giving all Bigrams the same score, so I end up with a list in alphabetical order when I use .nbest. Where as when I just bigram_measures.likelihood_ratio the…

python nltk

asked Feb 21 '19 at 00:38

Ruoran Huang

1
1

0

votes

1 answer

NLP various probabilities estimators in nltk

I saw there are many types of probabilities in nltk: MLE, ELE, Laplace, Heldout, KnereserNey, Lidstone, Random, WittenBel.. What is the exact difference between them and when should I use each? My goal is to get the entropy of a specific sentence…

python natural-language nltk entropy

asked Sep 12 '18 at 07:14

okuoub

27
8

0

votes

0 answers

Naive Bayes Assignment of Feature Probability

I'm using the .show_most_informative_features() function from NLTK's Naive Bayes to generate features to be used with a lexicon. In the case of my binary-classification problem, these features are calculated as (where W = feature and V =…

probability naive-bayes nltk

asked Aug 17 '18 at 19:40

Laurie

111
2

0

votes

0 answers

Handling False Positive in a Classifier

Suppose I have the following code of an NLTK Naive Bayes Classifier. It is a toy example of a sentiment analysis implementation. import nltk from nltk import NaiveBayesClassifier as nbc from nltk.tokenize import word_tokenize from itertools import…

machine-learning classification python natural-language nltk

asked Jun 19 '18 at 06:16

HIGGINS

479
8
12

0

votes

1 answer

How to classify text when having very little training data

I have a dataframe as follows: New_Text | New_Score review1 | Positive review2 | Negative review4 | Positive ... and so on. I want to create a model that tells whether a review is Positive or Negative I have been asked to use only 30% of…

python machine-learning scikit-learn nltk

asked Aug 23 '17 at 20:10

Rajiv

Questions tagged [nltk]