Speech recognition is the process of converting the spoken word to text, usually without regard to a particular speaker (which is more commonly referred to as "voice recognition").
Questions tagged [speech-recognition]
187 questions
18
votes
3 answers
human speech noise filter
Does anyone know of a filter to attenuate non-speech? I am writing speech recognition software and would like to filter out everything but human speech. This would include background noise, noise produced by a crappy microphone, or even background…
rurouniwallace
- 403
- 1
- 4
- 14
12
votes
1 answer
Determining how similar audio is to human speech
While looking for an answer to this problem, I found this board so decided to cross post this question of mine from Stack Overflow.
I am searching for a method of determining the similarity between an audio segment and a human voice, which is…
Jeff Gortmaker
- 223
- 1
- 5
10
votes
1 answer
How does noise reduction for speech recognition differ from noise reduction that is supposed to make speech more "intelligible" for humans?
this is a question that has interested me for some time now, mainly because I'm working on noise reduction for an existing speech recognition system myself.
Most papers on noise reduction techniques seem to focus on how to make speech more…
marlonfl
- 103
- 5
10
votes
1 answer
Designing a feature vector for discriminating between different sonic waveforms
Consider the 4 following waveform signals:
signal1 = [4.1880 11.5270 55.8612 110.6730 146.2967 145.4113 104.1815 60.1679 14.3949 -53.7558 -72.6384 -88.0250 -98.4607]
signal2 = [ -39.6966 44.8127 95.0896 145.4097 144.5878 …
Andy
- 1,647
- 1
- 16
- 26
9
votes
1 answer
How to segment phone call audio into silence/non silence?
My problem is that I don't know the energy of the background noise, so I can't just threshold the energy. The processing is done in real time, and I have about 500msec to decide.
Ideally, I'd want quiet consonants considered non-silence.
Michael Litvin
- 372
- 2
- 7
9
votes
3 answers
How does Siri recognize me saying "Hey Siri"?
I am trying to understand how my iPhone can continually listening for me saying Hey Siri, Alexa, Hey Cortana or Okay Google without quickly draining my battery down.
I imagined two kind of algorithm. One that record slice of time such as 10 ms wide…
nowox
- 191
- 5
8
votes
2 answers
What does a "vector" in a hidden Markov model mean?
I know that a Hidden Markov Model (HMM) is used in speech recognition and understand it to some degree. However, what I don't know is how input (speech) is "transformed" to a vector which in later used in HMM.
How do you get a vector from a sound…
StupidOne
- 199
- 1
- 6
8
votes
1 answer
What's the correct graphical interpretation of a series of MFCC vectors?
I'm studying speech-recognition, in particular the use of MFCC for feature extraction. All examples I've found online tend to graph a series of MFCC extracted from a particular utterance as follows (graph generated by me from the software I'm…
jotadepicas
- 193
- 1
- 8
7
votes
1 answer
Distinguish vowels from consonants
Problem of processing speech. Required to determine the phonemes and identify vowels and consonants. Anyone involved in this? Please advise what work on the subject is worth reading?
ekruten
- 93
- 1
- 4
7
votes
3 answers
Why does the excitation signal appear, separated, at high quefrencies in the cepstrum?
So, I've just begun a speech and language processing course and have found the explanation of the process of getting the cepstrum of a signal and its properties a little confusing. The following is a description of my current understanding and an…
Sam
- 171
- 3
7
votes
1 answer
how does this equation correspond to smoothing?
Please help me understand smoothing of data. This is a follow up to my previous question posted here. Especially the top answer by Junuxx where he says a way of smoothing a function $f(x)$ is:
$$
f'[t] = 0.1 f[t-1] + 0.8 f[t] + 0.1 f[t+1]
$$
here we…
user13267
- 501
- 1
- 5
- 20
6
votes
1 answer
Hidden Markov Model for Speech Rcognition. HMM Number of States
This is a question that came to mind as a result of a previous question Hidden Markov Models - Distinct Observation Symbols and subsequent answer from @pichenettes.
One approach to speech recognition is to use Hidden Markov Models (HMM) to identify…
user2718
- 2,176
- 10
- 10
5
votes
0 answers
Zero-padding of MFCC coefficients
I am trying to implement speech recognition using backpropagation algorithm, and I have been following this paper.
I have followed it all the way, except that it tells me to zero-pad the coefficients when there is an empty slot, because the MFCC…
motiur
- 394
- 1
- 4
- 15
5
votes
2 answers
Dynamic Time Warping - Comparing Values
Ok, so I'm trying to compare two different speech signals and I have come into a problem. Here goes:
I have split the signal into blocks, and I have computed the MFCC coefficients of each block. I then use a DTW algorithm to compare the (inputted)…
Phorce
- 455
- 1
- 6
- 17
5
votes
1 answer
What are i-vectors and x-vectors in the context of Speech Recognition?
I have read that i-vectors and x-vectors are widely used in speaker recognition tasks but I don't get the difference between them and how exactly they work. Can someone explain it starting from the ground to a bit technical?
I came across following…
mausamsion
- 151
- 1
- 1
- 4