Is Length Normalization used in each step of Beam Search?

Question

In Andrew Ng's lesson on refining Beam Search, it seems that Length Normalization is used ONLY AFTER LAST STEP of Beam Search, that is, when the B most probable sequences have been generated. My question is, would it be better to use Length Normalization on EACH step of Beam Search?

Lerner Zhang · Answer 1 · 2021-09-20T14:43:47.270

Yes. Length normalization is used in each step of Beam Search, otherwise you cannot compare hypotheses during the later steps, and at each step the score is calculated this way:

$$ \mathrm{score}(y) = - \log P(y \mid x) = \frac{1}{T} \sum_{i=1}^T - \log P(y_i \mid y_1, \ldots, y_{i-1}, x)$$

For those hypotheses that have stopped, the T will not be increasing, then they are comparable with hypotheses that are before encountering a stop sign.

Is Length Normalization used in each step of Beam Search?

1 Answers1