Questions tagged [speech-processing]

Speech processing is the study of speech signals and the processing methods of these signals.

Speech processing is the study of speech signals and the processing methods of these signals. The signals are usually processed in a digital representation, so speech processing can be regarded as a special case of digital signal processing, applied to speech signal.

Speech signal is one form of audio signal. The others are tone, music and noises. A method in signal processing usually tested with music, speech and mixture of both to test the developed algorithm.

256 questions

votes

1 answer

How does noise reduction for speech recognition differ from noise reduction that is supposed to make speech more "intelligible" for humans?

this is a question that has interested me for some time now, mainly because I'm working on noise reduction for an existing speech recognition system myself. Most papers on noise reduction techniques seem to focus on how to make speech more…

asked Jul 14 '17 at 14:17

marlonfl

votes

3 answers

How do I go about detecting whistles, pops and other sounds in live audio input?

I've read many questions on SO, and frankly, each one of them is not describing any particular way to go about it. Some say "do FFT" and some say "zero crossing" etc. But I've only gone as far as understanding that the digital audio input consists…

fft audio speech-processing

asked May 31 '13 at 05:11

bad_keypoints

votes

2 answers

Calculation of Reverberation Time (RT60) from the Impulse Response

I have some confusions regarding reverberation time (RT60). I need to calculate reverberation time from a given power envelope. This is what I get. As you might see the line is curved close to zero (at top left corner) and then becomes straight. Is…

impulse-response speech-processing acoustics

asked Jun 30 '14 at 05:23

varunkr

votes

1 answer

Why should one use windowing functions for FFT?

So I just revised my pitch calculation algorithm using a harmonic product spectrum algorithm. I was just curious about why this explanation of Harmonic Product Spectrum states that you need to implement a Hanning Window to the data set. What would…

fft frequency sound window-functions speech-processing

asked Oct 26 '13 at 20:59

Skylion

votes

2 answers

Is There a Sparse Representation for Noise?

Is there sparse representation for stationary noise and nonstationary noise? How can I learn dictionary for each noise class? (my mean of noise is noises with which speech signals are often contaminated such as white gaussian noise, car noise,…

speech-processing sparsity sparse-model

asked Jul 27 '18 at 15:33

beni

votes

1 answer

How do I construct input to neural network from audio signals?

Input: Microphone recordings of digits from 0 to 9 from different speakers. Output: The digit from 0 to 9. I am doing this for fun. So first I will train my neural network using some samples and then use it to classify digits. Problem is every…

audio fourier-transform speech-processing

asked Aug 10 '14 at 19:57

Pratik Deoghare

votes

1 answer

What are i-vectors and x-vectors in the context of Speech Recognition?

I have read that i-vectors and x-vectors are widely used in speaker recognition tasks but I don't get the difference between them and how exactly they work. Can someone explain it starting from the ground to a bit technical? I came across following…

speech-processing speech-recognition speech

asked Jun 24 '19 at 01:58

mausamsion

votes

2 answers

Harmonics to Noise Ratio Estimation

I'm willing to estimate the Harmonics to Noise Ratio (HNR) of a speech signal x[k] and using autocorrelation method. Theoretically, HNR is given as, $$ \ HNR = \frac{R_{xx}[T_0] }{R_{xx}[0]-R_{xx}[T_0]} $$ where $\ R_{xx}$ is the autocorrelation…

autocorrelation estimation speech-processing

asked May 15 '18 at 11:36

kubicwerke

votes

3 answers

Signal processing for audio and speech

I have started learning about signals and I am interested in sound signals. There are some questions that I need to resolve. For example in music, difference of notes like 'sol' and 'la' are about difference in frequency. But which features of…

audio sound speech-processing

asked Oct 07 '16 at 23:12

virtouso

votes

1 answer

Voice Audio Detection algorithm

I have to detect speech intervals in large pre-recorded files in my project. I think, there won't be so much noise in the background (audio will be recordered in the room or even in the studio), but it still can be. So, what are good VAD algorithms…

audio algorithms speech speech-processing voice

asked Feb 18 '14 at 15:37

emptysamurai

votes

1 answer

Which Programming Language Should Be Used for Deep Learning (Deep Neural Network [DNN])?

I will do voice activity detection and speech enhancement based deep neural network. However, I don't know whether to do this via matlab or pyhton. In which programming language can I find more ready-made code on this subject? Which one do you…

matlab python speech-processing machine-learning deep-learning

asked Oct 13 '21 at 09:02

Zang Li

votes

2 answers

Mel Cepstral Distortion

I am working on a speech synthesis model and I am looking to evaluate my synthesized speech. I found that most people use the Mel Cepstral Distortion (MCD) which can be calculated by the…

speech-processing speech speech-synthesis

asked Apr 02 '19 at 01:42

MrHat

votes

1 answer

How to reduce synthesis artifacts produced by phase vocoder?

I implemented the phase vocoder algorithm in Python to time-stretch speech signals by following this paper and referring to this MATLAB tutorial. I can distinguish words in the original signal from the resynthesis result, and the pitch, as expected,…

filters speech-processing stft

asked Feb 21 '19 at 03:53

Steven Chan

votes

1 answer

What Is the Point of Doing the Zero Padding?

What are the advantages and disadvantages of doing Zero-padding, in particular the case of speech signals?

fft fourier-transform dft speech-processing speech-recognition

asked Nov 11 '18 at 18:55

user38784

votes

1 answer

Cepstrum calculus disambiguation

Learning the cepstrum analysis for speech recognition, I have met two different definitions of cepstrum (for discrete signals): $F^{-1}(ln|F(x[n])|)$ . That is, the cepstrum is the inverse Fourier transform of the logarithm of the magnitude of the…

speech-processing speech-recognition cepstral-analysis

asked Sep 29 '17 at 09:34

Carlo Benussi

2 3

…

17 18 Next