6

At the end of the introduction of A Neural Probabilistic Language Model (Bengio et al. 2003), the following example is given:

Having seen the sentence The cat is walking in the bedroom in the training corpus should help us generalize to make the sentence A dog was running in a room almost as likely.

I get the general spirit, but they provide this example right after explaining that a n-gram language model gives the probability of occurrence of a word given some other previous (context) words: $P[w_{t}\mid (w_{t-n+1},...,w_{t-1})]$. So switching to sentence probability without transition is a bit confusing.

What do they mean by "making a sentence likely" since the models work at the word level?

PS: I can understand that if we see The cat is walking in the bedroom in the training corpus, we can estimate $p_{0}=P[bedroom\mid (cat,walking)]$. It is clear that taking word similarity into account, when generalizing we would want $P[bedroom\mid (dog,running)]$ to be roughly equal to $p_{0}$ (since dog and cat, walking and running, are similar). But this has still to do with word probabilities. And also, what doesn't work here is that in A dog was running in a room, $bedroom$ does not occur, so we only deal with $P[room\mid(dog,running)]$.

Franck Dernoncourt
  • 42,093
  • 30
  • 155
  • 271
Antoine
  • 5,740
  • 7
  • 29
  • 53

1 Answers1

1

Language models are often used to compute the probability of a sentence. This is done by using the chain rule.

For example if we want to estimate the probability of observing the sentence $w_1 w_2 w_3 w_4$ we can factorize it like so...

$P(w_1, w_2, w_3, w_4) = P(w_4|w_3, w_2, w_1) P(w_3|w_2, w_1) P(w_2| w_1) P(w_1) $

Each of those terms is something that can be straightforwardly computed by the language model.

Aaron
  • 3,025
  • 14
  • 24
  • 1
    @ Aaron - Would you pls share some details about the usefulness of calculating probability of a sentence. I too, like Antoine, could not get my head around it - the need for conditional probability of a word seems natural but not sure about the joint probability of a sentence. – KGhatak Mar 08 '19 at 13:18
  • 1
    Let's say you are working on speech recognition and your system can't tell if someone said "wreck a nice beach" or if they said "recognize speech". By running the language model on each utterance you could pick which word sequence is the most likely. – Aaron Mar 11 '19 at 16:04