I have a question for you - is padding, before feature extraction with VGGish, a good practice? Our padding technique is to find the longest signal (which is loaded .wav signal) and then in every shorter signal put zeros to the size of the longest one. We need to use it because one size of input data is desirable. Perhaps there is any other techniques you recommend? Difference between padding before and after the features extraction by accuracy is quite big - more than 20%. Using padding before extraction gives 97% accuracy. I'd be glad to read your feedback and explain me why that happens and tell me if that kind of padding is correct action or is there a better solution.
Asked
Active
Viewed 21 times
0
-
Are you saying that without padding, the performance degrades by 20% ? that seems highly suspicious. Be very careful if you have some class-dependence in the length of you audio signals - then the padding might be a way for the model to cheat – Jon Nordby May 04 '21 at 12:01
-
I don't see why one would pad the audio waveform when extracting features with VGGish. Only case I would do it is when the waveform is smaller than one analysis window (960 ms) – Jon Nordby May 04 '21 at 12:03