2

In Andrew Ng's lesson on refining Beam Search, it seems that Length Normalization is used ONLY AFTER LAST STEP of Beam Search, that is, when the B most probable sequences have been generated. My question is, would it be better to use Length Normalization on EACH step of Beam Search?

cjbayron
  • 343
  • 2
  • 9

1 Answers1

0

Yes. Length normalization is used in each step of Beam Search, otherwise you cannot compare hypotheses during the later steps, and at each step the score is calculated this way:

$$ \mathrm{score}(y) = - \log P(y \mid x) = \frac{1}{T} \sum_{i=1}^T - \log P(y_i \mid y_1, \ldots, y_{i-1}, x)$$

For those hypotheses that have stopped, the T will not be increasing, then they are comparable with hypotheses that are before encountering a stop sign.

Lerner Zhang
  • 5,017
  • 1
  • 31
  • 52