How to normalize text when computing the word error rate of a speech recognition system?

Question

I am looking for a library, script or program that can normalize the transcribed and gold texts when computing the word error rate (WER) of an automated speech recognition system.

For example, if:

the gold transcript is Without the dataset the article is useless
the predicted transcript is Without the data set the article's useless

the texts should be normalized so that the WER is 0 (and not 3 or 4 if the text isn't normalized).

I have crossposted the question at:

score 1 · Answer 1 · edited Mar 19 '18 at 02:22

The hubscr.pl tool from sclite uses GLM files for normalization, you can dowload an example here or here.

The syntax of GLM files is described here (mirror).

GLM files are essentially regular expression kind of normalization where you need to list all possible expansions. They aren't generic enough though.

Excerpt from en20030506.glm (mirror):

[WINNER'S] => [{WINNER'S / WINNER IS / WINNER HAS}] / [ ] __ [ ]
[WINTER'S] => [{WINTER'S / WINTER IS / WINTER HAS }] / [ ] __ [ ]
[WISCONSIN'S] => [{WISCONSIN'S / WISCONSIN IS / WISCONSIN HAS}] / [ ] __ [ ]
[WIT'S] => [{WIT'S / WIT IS / WIT HAS}] / [ ] __ [ ]
[WOMAN'S] => [{WOMAN'S / WOMAN IS / WOMAN HAS}] / [ ] __ [ ]

Thanks! Indeed I can see that dataset -> data set is missing. But still very useful. — Franck Dernoncourt, Mar 19 '18 at 02:25

How to normalize text when computing the word error rate of a speech recognition system?

1 Answers1

Linked