Questions tagged [natural-language]

Natural Language Processing is a set of techniques from linguistics, artificial intelligence, machine learning and statistics that aim at processing and understanding human languages.

Natural Language Processing is a set of techniques from linguistics, artificial intelligence, machine learning and statistics that aim at processing and understanding human languages.

NLP Tasks

Typical NLP tasks are:

  • Word Sense Disambiguation
  • Part-of-Speech Tagging
  • Named Entity Recognition
  • Machine Translation
  • Information Retrieval
  • Question/Answering
  • Text Classification
  • Text Clustering
  • and others

NLP Resources

Books

Lectures

  • Natural Language Processing, by Dan Jurafsky, Christopher Manning
  • Natural Language Processing, by Michael Collins
1054 questions
158
votes
9 answers

What exactly are keys, queries, and values in attention mechanisms?

How should one understand the keys, queries, and values that are often mentioned in attention mechanisms? I've tried searching online, but all the resources I find only speak of them as if the reader already knows what they are. Judging by the paper…
Sean
  • 2,184
  • 2
  • 9
  • 22
59
votes
5 answers

Apply word embeddings to entire document, to get a feature vector

How do I use a word embedding to map a document to a feature vector, suitable for use with supervised learning? A word embedding maps each word $w$ to a vector $v \in \mathbb{R}^d$, where $d$ is some not-too-large number (e.g., 500). Popular word…
59
votes
4 answers

Recurrent vs Recursive Neural Networks: Which is better for NLP?

There are Recurrent Neural Networks and Recursive Neural Networks. Both are usually denoted by the same acronym: RNN. According to Wikipedia, Recurrent NN are in fact Recursive NN, but I don't really understand the explanation. Moreover, I don't…
57
votes
1 answer

Should I normalize word2vec's word vectors before using them?

After training word vectors with word2vec, is it better to normalize them before using them for some downstream applications? I.e what are the pros/cons of normalizing them?
Franck Dernoncourt
  • 42,093
  • 30
  • 155
  • 271
47
votes
3 answers

Intuitive difference between hidden Markov models and conditional random fields

I understand that HMMs (Hidden Markov Models) are generative models, and CRF are discriminative models. I also understand how CRFs (Conditional Random Fields) are designed and used. What I do not understand is how they are different from HMMs? I…
43
votes
6 answers

Neural network references (textbooks, online courses) for beginners

I want to learn Neural Networks. I am a Computational Linguist. I know statistical machine learning approaches and can code in Python. I am looking to start with its concepts, and know one or two popular models which may be useful from a…
43
votes
5 answers

LDA vs word2vec

I am trying to understand what is similarity between Latent Dirichlet Allocation and word2vec for calculating word similarity. As I understand, LDA maps words to a vector of probabilities of latent topics, while word2vec maps them to a vector of…
42
votes
4 answers

Is LSTM (Long Short-Term Memory) dead?

From my own experience, LSTM has a long training time, and does not improve performance significantly in many real world tasks. To make the question more specific, I want to ask when LSTM will work better than other deep NN (may be with real world…
Haitao Du
  • 32,885
  • 17
  • 118
  • 213
35
votes
2 answers

Is cosine similarity identical to l2-normalized euclidean distance?

Identical meaning, that it will produce identical results for a similarity ranking between a vector u and a set of vectors V. I have a vector space model which has distance measure (euclidean distance, cosine similarity) and normalization technique…
32
votes
3 answers

Why do transformers use layer norm instead of batch norm?

Both batch norm and layer norm are common normalization techniques for neural network training. I am wondering why transformers primarily use layer norm.
SantoshGupta7
  • 629
  • 1
  • 6
  • 12
26
votes
3 answers

Topic models and word co-occurrence methods

Popular topic models like LDA usually cluster words that tend to co-occur together into the same topic (cluster). What is the main difference between such topic models, and other simple co-occurrence based clustering approaches like PMI ? (PMI…
24
votes
3 answers

Why is skip-gram better for infrequent words than CBOW?

I wonder why skip-gram is better for infrequent words than CBOW in word2vec. I have read the claim on https://code.google.com/p/word2vec/.
Franck Dernoncourt
  • 42,093
  • 30
  • 155
  • 271
23
votes
1 answer

Has the reported state-of-the-art performance of using paragraph vectors for sentiment analysis been replicated?

I was impressed by the results in the ICML 2014 paper "Distributed Representations of Sentences and Documents" by Le and Mikolov. The technique they describe, called "paragraph vectors", learns unsupervised representations of arbitrarily-long…
19
votes
2 answers

How is the .similarity method in SpaCy computed?

Not Sure if this is the right stack site, but here goes. How does the .similiarity method work? Wow spaCy is great! Its tfidf model could be easier, but w2v with only one line of code?! In his 10 line tutorial on spaCy andrazhribernik show's us the…
whs2k
  • 451
  • 1
  • 3
  • 10
18
votes
2 answers

Why does Natural Language Processing not fall under Machine Learning domain?

I encounter it in many books as well as web. Natural Language Processing and Machine Learning are said to be different subsets of Artificial Intelligence. Why is it? We can achieve results of Natural Language Processing by feeding sound patterns to…
user931
  • 281
  • 2
  • 5
1
2 3
70 71