In Andrew Ng's lesson on refining Beam Search, it seems that Length Normalization is used ONLY AFTER LAST STEP of Beam Search, that is, when the B most probable sequences have been generated. My question is, would it be better to use Length Normalization on EACH step of Beam Search?
Asked
Active
Viewed 422 times
1 Answers
0
Yes. Length normalization is used in each step of Beam Search, otherwise you cannot compare hypotheses during the later steps, and at each step the score is calculated this way:
$$ \mathrm{score}(y) = - \log P(y \mid x) = \frac{1}{T} \sum_{i=1}^T - \log P(y_i \mid y_1, \ldots, y_{i-1}, x)$$
For those hypotheses that have stopped, the T will not be increasing, then they are comparable with hypotheses that are before encountering a stop sign.

Lerner Zhang
- 5,017
- 1
- 31
- 52