Questions tagged [speech-synthesis]

18 questions
13
votes
1 answer

Speech synthesis requiring very little CPU performance?

Back in the days of 1 MHz 8-bit CPU personal computers (Apple II, Atari 800, et.al.), there were software programs that could do comprehendible arbitrary text-to-speech synthesis on those PCs. What published speech synthesis algorithms might be…
hotpaw2
  • 33,409
  • 7
  • 40
  • 88
4
votes
2 answers

Mel Cepstral Distortion

I am working on a speech synthesis model and I am looking to evaluate my synthesized speech. I found that most people use the Mel Cepstral Distortion (MCD) which can be calculated by the…
MrHat
  • 81
  • 1
  • 7
3
votes
3 answers

How to simulation different kinds of Noises in speech signal?

What are the different kinds of noise in speech signal? How can I simulate the same in Matlab for adding to a clean speech signal?
2
votes
0 answers

How to use linear predictive coding to compress voice diphone samples?

I'm working on an experimental diphone / unit selection speech synthesizer for my native language which lacks good speech synthesizer for blind people. The problem is that recorded unit library can get very huge (hundreds of megabytes, as seen in…
2
votes
0 answers

What is actually transmitted following LPC of a speech frame?

For each frame, what's sent over to the receiver for decoding? The coefficients , pitch in some bits, voiced/unvoiced classification in another bit? Another broad question to develop intuition about LPC and CELP for speech encoding. By using…
panthyon
  • 1,083
  • 11
  • 24
1
vote
1 answer

Computation of parameter filter to match a given frequency response

I'm looking for the practical way to compute the parameters ($a_1$, $a_2$) of a digital filter to match a certain frequency response. I'm studying the 12-poles filter of the vintage component SP0256 which is used in speech synthesis, using bandpass…
1
vote
0 answers

Bandwidth enhanced sinusoidal model with oscillators banks

i am currently implementing the bandwidth enhanced sinusoidal model for an additive synthesizer, this model allow to accurately produce noisy sounds by adding a noise component to each sinusoids. But i have some troubles understanding the technical…
Onirom
  • 113
  • 5
1
vote
1 answer

Effect of redundant training data in HMM-based speech recognizer/synthesizer?

How are redundant training data handled during the training stage? For example, assume we have one observation for phone $\theta$ in the training set. Then the training (for a monophone) is done with: $$\lambda_{max}^\theta = \text{arg}…
1
vote
1 answer

Why are the observation features of an HMM-based recognition/synthesis system modeled by a Gaussian distribution?

Why are the observation features (namely MFCCs) of an HMM-based recognition/synthesis system modeled by a Gaussian distribution? Even the state duration is modeled by a Gaussian in this paper: K. Tokuda et al., "Speech synthesis based on hidden…
1
vote
1 answer

Removing vocoder effect from audio file

There are many tutorials and guides explaining how to create a "robotic" voice by using a vocoder or a mix of filters that change pitch, tempo and maybe add a slight echoed delay. However, is it possible to do that opposite? How would you take a…
Cerin
  • 588
  • 5
  • 9
1
vote
1 answer

In framing of audio samples', what is need of frame shift while giving frame size??

In framing of audio samples' in audio feature extraction, what is need of frame shift while giving frame size?? i.e. frame size = 20ms, frame shift = 10ms. Rather then shifting/overlapping why can't we use continuous frames then of…
1
vote
2 answers

PSOLA how to calculate synthesized Marks

I have successfully extracts all Pitch Markers from waveforms, I used autocorrelation to find my pitch periods, The markers can be observed in the figure below: I want change my signal (Pitch or Speed) using PSOLA, I need find the synthesized…
0
votes
1 answer

Sample text to collect all possible English biphones for Text-To-Speech

Since the list of phonemes in English is fixed, it should be possible to come up with a sample text(s) to collect all possible biphones for text-to-speech synthesis. Does anyone have a sample text for this purpose?
alvas
  • 51
  • 5
0
votes
1 answer

Hw does a digital piano work?

I know someohow it combines different harmonics to synthsis note, but I don't exactly know how this happens. Does digital piano use pwm to synthesis a note?
0
votes
2 answers

Conceptually confused by LPC for speech: Do we synthesize by the inverse filter (FIR)?

I've been following the process of using LPC to analyse the speech and then synthesize the speech by swapping out coefficients over time. I'm purely concerned with pitched vowels at this point. lpc(speech_segment) gives us $a_{p}$ coefficients of an…
1
2