0

I having a dataset ready for speech recognition. I having 10 different speakers, each utterance English digit 0 to 9 with 5 samples of each digit.

So how do I validate my dataset by using k-fold cross validation?

Thank you.

Jacky
  • 3
  • 2

1 Answers1

0

If your sample space includes only 50 samples (5 per digit) then your dataset is too small , you need to get more data (or generate through data augmentation , even if its 500 - 5 per user per digit you'll need more data to get better accuracies but you can start experimenting !! ).

More on the sizes of datasets on this discussion.Regarding k-fold cross validation you can follow this documentation if you use python.

Vaibhav Arora
  • 338
  • 1
  • 8
  • Can I augmented my samples with different SNR value to expand my samples? for example: -10dB, -5dB, 0dB, 5dB, 10dB, 15dB and 20dB ? – Jacky Mar 01 '17 at 05:05
  • I think that this is done mostly if you want to make your models more robust to noise ... that being said i think varying SNR wouldnt help you since without sufficient data it would not be able to become invariant to different types of utterances of the same word , this also depends on what model you are using..(It wouldnt hurt to try this though). There are other ways to augment dataset look at this [link] http://speak.clsp.jhu.edu/uploads/publications/papers/1050_pdf.pdf eg changing the tempo . Hope this helps ! – Vaibhav Arora Mar 01 '17 at 06:15
  • Thanks !! if this answer solved your problem please accept it .. helps with the reputation :) – Vaibhav Arora Mar 03 '17 at 11:05