Highest Voted 'speech-recognition' Questions - Statistical Analysis Stack Exchange

7

votes

1 answer

Turkish speech recognition (speech->text) in Google Speech API?

Google's Speech API has audio speech to text capabilities in multiple languages. It supports Turkish too. That language is very interesting, it's so called agglutinative: you stick word parts one after another instead of prepositions and other parts…

speech-recognition

asked Mar 23 '18 at 20:02

Aksakal

55,939
5
90
176

3

votes

0 answers

The state-of-the-art methods for speech recognition?

Recently I noticed that Google and Apple have really high quality speech-recognition services. I was wondering about the state-of-the-art methods and techniques they are/might be using to achieve such quality. I already know that Hidden Markov…

asked Feb 05 '14 at 21:22

Jack Twain

7,781
14
48
74

3

votes

0 answers

Validation loss is less than training loss by 5 units. How this result is interpreted?

Iam training a Keras model for end-to-end speech recognition. I have my own dataset of speech containing about 400 wave files. Text transcriptions is also given as input. Model summary is: Layer (type) Output Shape Param # the_input (InputLayer)…

deep-learning keras speech-recognition

asked Mar 30 '18 at 05:28

ml_user0993

31
3

3

votes

0 answers

Program to evaluate the output of a speech recognition system

I am looking for a library, script or program that can evaluate the output of a speech recognition system. The output of the speech recognition system is a simple text file, and I have the gold output in the same format. I have crossposted the…

natural-language software model-evaluation speech-recognition

asked May 08 '17 at 02:00

Franck Dernoncourt

42,093
30
155
271

2

votes

0 answers

Confusion about the derivative in CTC

I was going through the original CTC paper by Graves et al, I am still not getting how after taking the derivative of equation 14 we get equation 15 as shown below I understand the part that we are considering only those paths that involve the label…

neural-networks natural-language recurrent-neural-network speech-recognition

asked Oct 17 '20 at 20:30

Divyaanand Sinha

31
2

2

votes

1 answer

Mismatching dimensions of input/output in the WaveNet model for text-to-speech generation?

I have been trying to understand the model of how speech generation works, particularly in WaveNet model by Google. I was referring to the original WaveNet paper and this implementation: I find the model very confusing in the input it takes and the…

neural-networks conv-neural-network tensorflow speech-recognition transposed-convolution

asked Jun 12 '20 at 19:06

Joe Black

299
1
10

2

votes

1 answer

Method for detecting previously unseen class

Is there any common practice for detecting a new class, or data associated with an previously unseen event? I'm doing some research into speech recognition, and I'm trying to detect when a speech recognizer encounters a speaker it hasn't seen…

classification speech-recognition

asked Nov 08 '19 at 11:51

Cerin

644
7
16

2

votes

1 answer

When use CTC-loss for speech recognition?

I'm trying to understand and implement CTC-loss for speech recognition (here on SO). I'll like to have more information about the use cases of this technique. From what i understood, it is more dedicated to understand sentences (e.g. "Please close…

machine-learning neural-networks speech-recognition

asked Aug 08 '19 at 11:19

Baptiste Pouthier

23
5

2

votes

1 answer

Streaming audio to neural network

I am trying to create a neural network that performs speaker recognition. I would like to be able to serve it such that it takes streaming audio - i.e. I want to perform partial recognition on 100ms frames and then calculate an average at the end. I…

machine-learning neural-networks speech-recognition

asked Jun 27 '19 at 02:19

Harry Stuart

219
1
6

2

votes

1 answer

How to use GMMs for acoustic signal classification?

There are a number of applications of the Gaussian Mixture Model (GMMs) to acoustics/audio data for the purposes of classification; ex paper1 and ex paper2. GMMs for the case of clustering and position source generation can be understood. What is…

gaussian-mixture-distribution signal-processing speech-recognition

asked Mar 03 '19 at 00:14

Vass

1,425
2
14
20

2

votes

1 answer

Verifying Time Warp

Time warp has been widely assumed in domain of speech processing. If $Xw(t)$ represents a time warped version of $X(t)$, then $Xw(t) = X(t-w(t))$ where $w(t)$ is an arbitrary function with a banded derivation. I think it has a direct relationship…

time-series statistical-significance signal-processing speech-recognition

asked Aug 28 '18 at 03:57

Mike Zadeh

21
3

2

votes

1 answer

Word Error Rate over Data Set

In speech to text, one common metric is the word error rate (WER). WER is the word-level Levenshtein distance, which is the minimum number of substitutions ($S$), deletions ($D$), and insertions ($I$) to modify the prediction to the ground truth…

machine-learning natural-language model-evaluation speech-recognition

asked Mar 14 '18 at 17:04

user18764

151
8

2

votes

1 answer

MFCCs and MoG-HMMs for speech recognition

BACKGROUND MFCCs are coefficients which represent the most important parts of speech, and about 12 of them are used to model a one 512 points long frame (of speech). Along with them you would use delta coeffients, which track the change of MFCCs…

hidden-markov-model baum-welch speech-recognition mfcc

asked Dec 17 '17 at 18:38

Desperado

131
4

2

votes

2 answers

Why has deep learning only shown decent results in the fields of computer vision and speech recognition?

We all know about the success of ImageNet, AlphaGo etc which used deep neural networks in computer vision, or the use of RNNs in Google Translate. But why are we not seeing similar advances in other fields like finance?

neural-networks deep-learning conv-neural-network computer-vision speech-recognition

asked Jul 31 '17 at 17:01

housecat64

516
2
11

1

vote

0 answers

Hidden Markov models in Speech Recognition

My first question here. So I am trying to build a sign language translator(from signs to text) and noticed that the problem itself is quite similar to speech recognition, so I started to research about that. Right now one thing is I can't figure out…

machine-learning natural-language hidden-markov-model speech-recognition

asked Jun 17 '21 at 06:51

Raiymbek Akshulakov

11
1

Questions tagged [speech-recognition]