Highest Voted 'language-models' Questions - Statistical Analysis Stack Exchange

16

votes

1 answer

What are the pros and cons of applying pointwise mutual information on a word cooccurrence matrix before SVD?

One way to generate word embeddings is as follows (mirror): Get a corpora, e.g. "I enjoy flying. I like NLP. I like deep learning." Build the word cooccurrence matrix from it: Perform SVD on $X$, and keep the first $k$ columns of U. Each row…

asked Oct 30 '15 at 04:08

Franck Dernoncourt

42,093
30
155
271

16

votes

3 answers

In Kneser-Ney smoothing, how are unseen words handled?

From what I have seen, the (second-order) Kneser-Ney smoothing formula is in some way or another given as $ \begin{align} P^2_{KN}(w_n|w_{n-1}) &= \frac{\max \left\{ C\left(w_{n-1}, w_n\right) - D, 0\right\}}{\sum_{w'} C\left(w_{n-1}, w'\right)} +…

machine-learning natural-language naive-bayes smoothing language-models

asked Sep 09 '14 at 17:41

sunside

311
3
10

11

votes

2 answers

Question about Continuous Bag of Words

I'm having trouble understanding this sentence: The first proposed architecture is similar to the feedforward NNLM, where the non-linear hidden layer is removed and the projection layer is shared for all words (not just the projection…

machine-learning neural-networks natural-language word-embeddings language-models

asked Mar 04 '15 at 23:21

user70394

263
1
3
8

10

votes

3 answers

Regarding using bigram (N-gram) model to build feature vector for text document

A traditional approach of feature construction for text mining is bag-of-words approach, and can be enhanced using tf-idf for setting up the feature vector characterizing a given text document. At present, I am trying to using bi-gram language model…

machine-learning data-mining text-mining natural-language language-models

asked Apr 02 '12 at 14:02

user3125

2,617
4
25
33

8

votes

1 answer

Language modeling: why is adding up to 1 so important?

In many natural language processing applications such as spelling correction, machine translation and speech recognition, we use language models. Language models are created usually by counting how often sequences of words (n-grams) occur in a large…

distributions modeling natural-language language-models

asked Jul 14 '12 at 03:57

user9617

183
4

7

votes

2 answers

Does trigram guarantee to perform more accurately than bigram?

When implementing some NLP project, such as text segmentation, Name Entity Recognition, does using trigram guarantee to perform more accurately than bigram? $$ Trigram: p(s_t\mid s_{t-2}, s_{t-1}) $$ $$ Bigram: p(s_t\mid s_{t-1}) $$ EDIT: I was…

modeling text-mining natural-language language-models

asked Aug 06 '13 at 15:27

xiaoyao

385
4
10

7

votes

2 answers

Calculating test-time perplexity for seq2seq (RNN) language models

To compute the perplexity of a language model (LM) on a test sentence $s=w_1,\dots,w_n$ we need to compute all next-word predictions $P(w_1), P(w_2|w_1),\dots,P(w_n|w_1,\dots,w_{n-1})$. My question is: How are these terms computed for a seq2seq…

deep-learning recurrent-neural-network language-models

asked Nov 25 '16 at 15:13

xhi

96
5

6

votes

2 answers

Neural network language model - prediction for the word at the center or the right of context words

Neural network language model - prediction for the word at the center or the right of context words? On Bengio's paper, the model predicts probability by n words for the next word, like predicting probabilities of "book", "car", etc., by n words…

natural-language language-models

asked Sep 23 '16 at 06:49

Tom

788
8
16

6

votes

1 answer

n-gram language model

At the end of the introduction of A Neural Probabilistic Language Model (Bengio et al. 2003), the following example is given: Having seen the sentence The cat is walking in the bedroom in the training corpus should help us generalize to make the…

probability distributions natural-language language-models

asked Dec 01 '15 at 15:16

Antoine

5,740
7
29
53

5

votes

1 answer

Why are Transformers "suboptimal" for language modeling but not for translation?

Language Models with Transformers states: Transformer architectures are suboptimal for language model itself. Neither self-attention nor the positional encoding in the Transformer is able to efficiently incorporate the word-level sequential context…

neural-networks natural-language language-models transformers machine-translation

asked Apr 29 '21 at 21:07

MWB

1,143
9
18

5

votes

1 answer

Why can't standard conditional language models be trained left-to-right and right-to-left?

From the BERT paper: Unfortunately, standard conditional language models can only be trained left-to-right or right-to-left, since bidirectional conditioning would allow each word to indirectly “see itself”, and the model could trivially predict…

language-models

asked Nov 26 '19 at 20:05

user2740

1,226
2
12
19

5

votes

1 answer

Advantage of character based language models over word based

Is there an intuition why character based models language bases models are preferred over word based. For example Karpathy builds his language model by predicting the next character in Karpathy Blog. The aspect I am struggling with is that not each…

machine-learning neural-networks natural-language recurrent-neural-network language-models

asked Jun 02 '16 at 16:46

PKuhn

201
2
4

4

votes

1 answer

How does one design a custom loss function? What features make a loss function "good"?

I have a custom situation for which I am trying to design a cost function. The idea is that you have a stack of LSTMs doing something slightly unconventional. Each LSTM$_l$ computes a linear transformation of its hidden layer $V_{l-1}h^t_l$ to…

optimization loss-functions language-models

asked Jun 29 '21 at 16:08

Sam

153
6

4

votes

2 answers

Generating text from language model

I have a trained LSTM language model and want to use it to generate text. The standard approach for this seems to be: Apply softmax function Take a weighted random choice to determine next word This is working reasonably well for me, but it would…

machine-learning natural-language language-models

asked Dec 18 '20 at 18:46

Christian Doucette

185
3

4

votes

1 answer

Skip-gram algorithm confusion

As a newbie to NLP, I am (deeply) confused by the middle step in the following diagram explaining the skip-gram algorithm. The video where this diagram was presented can be found at: https://www.youtube.com/watch?v=ERibwqs9p38 (Highly appreciate…

machine-learning deep-learning natural-language language-models

asked Apr 19 '18 at 21:11

MeiNan Zhu

327
2
12

Questions tagged [language-models]