Highest Voted 'text-mining' Questions - Statistical Analysis Stack Exchange

173

votes

3 answers

How does Keras 'Embedding' layer work?

Need to understand the working of 'Embedding' layer in Keras library. I execute the following code in Python import numpy as np from keras.models import Sequential from keras.layers import Embedding model = Sequential() model.add(Embedding(5, 2,…

text-mining word-embeddings keras

asked Mar 29 '17 at 12:47

prashanth

3,747
4
21
33

45

votes

2 answers

Difference between naive Bayes & multinomial naive Bayes

I've dealt with Naive Bayes classifier before. I've been reading about Multinomial Naive Bayes lately. Also Posterior Probability = (Prior * Likelihood)/(Evidence). The only prime difference (while programming these classifiers) I found between…

bayesian classification text-mining naive-bayes

asked Jul 27 '12 at 14:17

garak

2,033
4
26
31

43

votes

6 answers

How to quasi match two vectors of strings (in R)?

I am not sure how this should be termed, so please correct me if you know a better term. I've got two lists. One of 55 items (e.g: a vector of strings), the other of 92. The item names are similar but not identical. I wish to find the best…

r text-mining

asked Oct 08 '10 at 21:31

Tal Galili

19,935
32
133
195

35

votes

8 answers

In Naive Bayes, why bother with Laplace smoothing when we have unknown words in the test set?

I was reading over Naive Bayes Classification today. I read, under the heading of Parameter Estimation with add 1 smoothing: Let $c$ refer to a class (such as Positive or Negative), and let $w$ refer to a token or word. The maximum likelihood…

machine-learning classification text-mining naive-bayes laplace-smoothing

asked Jul 22 '14 at 04:29

tumultous_rooster

1,145
4
14
24

33

votes

6 answers

Statistical classification of text

I'm a programmer without statistical background, and I'm currently looking at different classification methods for a large number of different documents that I want to classify into pre-defined categories. I've been reading about kNN, SVM and NN.…

classification information-retrieval text-mining

asked Jul 19 '10 at 21:17

Emil H

431
5
5

32

votes

2 answers

StackExchange fires a moderator, and now in response hundreds of moderators resign: is the increase in resignations statistically significant?

I am doing a study on StackExchange. The management of StackExchange has demodded (for unclear reasons) a moderator, and now the network is on fire. Currently many moderators resign or suspend their activities because they are dissatisfied. I wish…

time-series hypothesis-testing sampling nonparametric text-mining

asked Oct 25 '19 at 10:49

Sextus Empiricus

43,080
1
72
161

32

votes

4 answers

Machine learning techniques for parsing strings?

I have a lot of address strings: 1600 Pennsylvania Ave, Washington, DC 20500 USA I want to parse them into their components: street: 1600 Pennsylvania Ave city: Washington province: DC postcode: 20500 country: USA But of course the data is dirty:…

machine-learning text-mining

asked Aug 28 '12 at 14:48

Jay Hacker

451
1
5
3

30

votes

4 answers

R packages for performing topic modeling / LDA: just `topicmodels` and `lda`

It seems to me that only two R packages are able to perform Latent Dirichlet Allocation: One is lda, authored by Jonathan Chang; and the other is topicmodels authored by Bettina Grün and Kurt Hornik. What are the differences between these two…

r bayesian text-mining topic-models latent-dirichlet-alloc

asked Mar 10 '12 at 15:47

bit-question

2,637
6
25
26

30

votes

3 answers

How well does R scale to text classification tasks?

I am trying to get upto speed with R. I eventually want to use R libraries for doing text classification. I was just wondering what people's experiences are with regard to R's scalability when it comes to doing text classification. I am likely to…

r machine-learning svm text-mining random-forest

asked Aug 13 '11 at 16:52

Andy

1,583
3
21
19

29

votes

1 answer

Is cross validation a proper substitute for validation set?

In text classification, I have a training set with about 800 samples, and a test set with about 150 samples. The test set has never been used, and waiting to be used until the end. I am using the whole 800 sample training set, with 10 fold cross…

machine-learning classification cross-validation text-mining

asked Nov 23 '11 at 23:33

Flake

1,131
2
13
21

28

votes

2 answers

Bag-of-Words for Text Classification: Why not just use word frequencies instead of TFIDF?

A common approach to text classification is to train a classifier off of a 'bag-of-words'. The user takes the text to be classified and counts the frequencies of the words in each object, followed by some sort of trimming to keep the resulting…

machine-learning classification text-mining

asked May 19 '15 at 18:30

shf8888

845
1
7
11

26

votes

3 answers

Topic models and word co-occurrence methods

Popular topic models like LDA usually cluster words that tend to co-occur together into the same topic (cluster). What is the main difference between such topic models, and other simple co-occurrence based clustering approaches like PMI ? (PMI…

machine-learning text-mining natural-language topic-models

asked Jul 15 '12 at 02:37

kanzen_master

1,235
3
15
22

23

votes

1 answer

Has the reported state-of-the-art performance of using paragraph vectors for sentiment analysis been replicated?

I was impressed by the results in the ICML 2014 paper "Distributed Representations of Sentences and Documents" by Le and Mikolov. The technique they describe, called "paragraph vectors", learns unsupervised representations of arbitrarily-long…

text-mining natural-language word-embeddings sentiment-analysis reproducible-research

asked Nov 11 '14 at 15:34

bskaggs

363
2
6

21

votes

2 answers

How to calculate perplexity of a holdout with Latent Dirichlet Allocation?

I'm confused about how to calculate the perplexity of a holdout sample when doing Latent Dirichlet Allocation (LDA). The papers on the topic breeze over it, making me think I'm missing something obvious... Perplexity is seen as a good measure of…

text-mining topic-models

asked Nov 10 '11 at 03:08

drevicko

394
1
3
11

20

votes

2 answers

Why does ridge regression classifier work quite well for text classification?

During an experiment for text classification, I found ridge classifier generating results that constantly top the tests among those classifiers that are more commonly mentioned and applied for text mining tasks, such as SVM, NB, kNN, etc. Though, I…

machine-learning classification text-mining ridge-regression

asked Oct 29 '11 at 18:14

Flake

1,131
2
13
21

Questions tagged [text-mining]