Questions tagged [speech-processing]

Speech processing is the study of speech signals and the processing methods of these signals.

Speech processing is the study of speech signals and the processing methods of these signals. The signals are usually processed in a digital representation, so speech processing can be regarded as a special case of digital signal processing, applied to speech signal.

Speech signal is one form of audio signal. The others are tone, music and noises. A method in signal processing usually tested with music, speech and mixture of both to test the developed algorithm.

256 questions
10
votes
1 answer

How does noise reduction for speech recognition differ from noise reduction that is supposed to make speech more "intelligible" for humans?

this is a question that has interested me for some time now, mainly because I'm working on noise reduction for an existing speech recognition system myself. Most papers on noise reduction techniques seem to focus on how to make speech more…
9
votes
3 answers

How do I go about detecting whistles, pops and other sounds in live audio input?

I've read many questions on SO, and frankly, each one of them is not describing any particular way to go about it. Some say "do FFT" and some say "zero crossing" etc. But I've only gone as far as understanding that the digital audio input consists…
bad_keypoints
  • 191
  • 1
  • 4
8
votes
2 answers

Calculation of Reverberation Time (RT60) from the Impulse Response

I have some confusions regarding reverberation time (RT60). I need to calculate reverberation time from a given power envelope. This is what I get. As you might see the line is curved close to zero (at top left corner) and then becomes straight. Is…
varunkr
  • 243
  • 1
  • 2
  • 4
7
votes
1 answer

Why should one use windowing functions for FFT?

So I just revised my pitch calculation algorithm using a harmonic product spectrum algorithm. I was just curious about why this explanation of Harmonic Product Spectrum states that you need to implement a Hanning Window to the data set. What would…
Skylion
  • 271
  • 1
  • 4
  • 12
6
votes
2 answers

Is There a Sparse Representation for Noise?

Is there sparse representation for stationary noise and nonstationary noise? How can I learn dictionary for each noise class? (my mean of noise is noises with which speech signals are often contaminated such as white gaussian noise, car noise,…
beni
  • 61
  • 2
6
votes
1 answer

How do I construct input to neural network from audio signals?

Input: Microphone recordings of digits from 0 to 9 from different speakers. Output: The digit from 0 to 9. I am doing this for fun. So first I will train my neural network using some samples and then use it to classify digits. Problem is every…
5
votes
1 answer

What are i-vectors and x-vectors in the context of Speech Recognition?

I have read that i-vectors and x-vectors are widely used in speaker recognition tasks but I don't get the difference between them and how exactly they work. Can someone explain it starting from the ground to a bit technical? I came across following…
mausamsion
  • 151
  • 1
  • 1
  • 4
5
votes
2 answers

Harmonics to Noise Ratio Estimation

I'm willing to estimate the Harmonics to Noise Ratio (HNR) of a speech signal x[k] and using autocorrelation method. Theoretically, HNR is given as, $$ \ HNR = \frac{R_{xx}[T_0] }{R_{xx}[0]-R_{xx}[T_0]} $$ where $\ R_{xx}$ is the autocorrelation…
kubicwerke
  • 129
  • 5
5
votes
3 answers

Signal processing for audio and speech

I have started learning about signals and I am interested in sound signals. There are some questions that I need to resolve. For example in music, difference of notes like 'sol' and 'la' are about difference in frequency. But which features of…
virtouso
  • 277
  • 1
  • 10
5
votes
1 answer

Voice Audio Detection algorithm

I have to detect speech intervals in large pre-recorded files in my project. I think, there won't be so much noise in the background (audio will be recordered in the room or even in the studio), but it still can be. So, what are good VAD algorithms…
emptysamurai
  • 151
  • 1
  • 2
4
votes
1 answer

Which Programming Language Should Be Used for Deep Learning (Deep Neural Network [DNN])?

I will do voice activity detection and speech enhancement based deep neural network. However, I don't know whether to do this via matlab or pyhton. In which programming language can I find more ready-made code on this subject? Which one do you…
4
votes
2 answers

Mel Cepstral Distortion

I am working on a speech synthesis model and I am looking to evaluate my synthesized speech. I found that most people use the Mel Cepstral Distortion (MCD) which can be calculated by the…
MrHat
  • 81
  • 1
  • 7
4
votes
1 answer

How to reduce synthesis artifacts produced by phase vocoder?

I implemented the phase vocoder algorithm in Python to time-stretch speech signals by following this paper and referring to this MATLAB tutorial. I can distinguish words in the original signal from the resynthesis result, and the pitch, as expected,…
Steven Chan
  • 173
  • 3
4
votes
1 answer

What Is the Point of Doing the Zero Padding?

What are the advantages and disadvantages of doing Zero-padding, in particular the case of speech signals?
user38784
4
votes
1 answer

Cepstrum calculus disambiguation

Learning the cepstrum analysis for speech recognition, I have met two different definitions of cepstrum (for discrete signals): $F^{-1}(ln|F(x[n])|)$ . That is, the cepstrum is the inverse Fourier transform of the logarithm of the magnitude of the…
1
2 3
17 18