3

From a deep learning practitioner's point of view, is there a lower limit on the number of hours of speech needed to train a neural net to translate speech to text? An estimate from CMU is 3000-5000 hours for 90% accuracy commercial quality speech recognition. Is there is a minimum amount of information needed to reproduce the complexity of the actual language. I.e., if you had 5000 hours and compress it down with a neural net, there is some minimum size neural net needed to do a good job at speech recognition. You can call that the "bit complexity of a natural language". Does inverting the compression ratio tell you how many hours of speech, at a minimum, you would need to train a commercial quality speech recognition system? For context, the classic paper Prediction and Entropy of Printed English seems relevant.

Lars Ericson
  • 361
  • 3
  • 8

0 Answers0