I'm planning on using MFCCs extracted from audio signals to make a speaker recognizer. I noticed that the first MFCC term tends to be very large, compared to the others. That's why I think that normalization is needed when working with machine learning algorithms (LSTM and HMM in my case). So, I think that I should have my MFCCs values between (-0.5,0.5) or (-1,1).
I tried (mfccs-mean)/std
and I'm currently trying with minmax normalization.
I know how each of these methods are calculated but what are the differences when using them or any other with a machine learning algorithm?