Highest Voted 'audio' Questions - Statistical Analysis Stack Exchange

2

votes

0 answers

goodness of fit for psychometric data (perceptual threshold)

I'm running an experiment on perceptual thresholds in audio. I'll try not to bog you down with too many details: The experiment is about vibrato speed; specifically, when can you tell the difference between two stimuli that differ in vibrato speed…

asked Dec 30 '20 at 22:43

Max

21
1

2

votes

1 answer

What are good basic loss functions for audio generation? (TTS)

I'm planning to make an audio generation NN. While I'm reasonably ok with neural networks in general, wavenets, etc., something is not quite clear. What are good loss functions for audio, considering the points below? Target data may have variable…

neural-networks conv-neural-network loss-functions audio

asked Mar 25 '20 at 14:45

Daniel Möller

121
3

2

votes

1 answer

Different fonts of audio from a single audio source

I know that is a duplicated question. But there is no answer for the original one and I want something more specific. On original question, the user Caaarlos wants the interpret different fonts of audio from a single audio source. But he only wants…

machine-learning neural-networks spectral-analysis audio

asked Apr 20 '16 at 00:22

Pasdf

53
4

1

vote

1 answer

Looking for repeated patterns in time series data

I have spent the best part of the last few days searching forums and reading papers trying to solve the following question. I have thousands of time series arrays each of varying lengths containing a single column vector. this column vector contains…

time-series python seasonality fourier-transform audio

asked Jul 28 '21 at 19:06

Dexter Jeffery

11
1

1

vote

0 answers

Why almost all neural speech processing involves Mel Spectrograms?

What are the reasons behind almost all speech processing whether it be generative or recognition heavily based on Mel Spectrograms? In a conversation with a signal processing expert I was asked why most ML systems in speech processing domain work…

speech-recognition audio

asked May 12 '21 at 14:52

Rijul Gupta

111
2

1

vote

0 answers

What is the name of this data denoising method

I've been working on extracting data from an extremely noisy signal. The signal itself is the 1st derivative of raw mean squared (RMS) of an audio that may contain segments with some single low frequency (LF). The RMS window size I'm using is…

rms noise audio

asked Jul 18 '20 at 11:44

DSPGuy

11
2

1

vote

0 answers

WGAN-GP stability loss

I am training a Conditional WaveGAN (1D DCGAN for audio) using WGAN-GP whose generator is of an auotencoder architecture. The network is trained to take an audio input, compress it, then decompress it into it's original waveform. I achieved…

machine-learning tensorflow autoencoders gan audio

asked Apr 08 '20 at 23:05

Harry Stuart

219
1
6

0

votes

0 answers

speaker recognition: training on enrollment data

I'm working on a speaker recognition challenge. I have already trained my model on the voxceleb2 dataset in triplet setup. Now, for the challenge, I have two sets. enrollment (1 audio/subject) [IDs given] test (random number of audios without…

neural-networks speech-recognition triplet-loss audio

asked Aug 09 '21 at 00:03

Zabir Al Nazi

85
6

0

votes

0 answers

Semi-supervised VS Self-taught learning

I want to build a Speaker Identification model and I am wondering what is the best for the feature extracting step: Using unlabeled examples from the same distribution as labeled ones (we can use the labeled data after ignoring the labels). Using…

machine-learning neural-networks transfer-learning semi-supervised-learning audio

asked May 07 '21 at 06:58

Kais Hasan

101
3

0

votes

0 answers

Is it a good practice to pad signal before feature extraction?

I have a question for you - is padding, before feature extraction with VGGish, a good practice? Our padding technique is to find the longest signal (which is loaded .wav signal) and then in every shorter signal put zeros to the size of the longest…

feature-engineering signal-processing audio

asked Apr 21 '21 at 12:24

Dawid_K

1

0

votes

1 answer

Conv2D Kernel size for audio-related tasks

So I've been working on this audio-rec task for a while now, and I've had some good luck using 2D convolutions on the spectrogram of audio (I've also tried Mel-spectrograms, the difference is minor in my opinion). Up until now I've been using this…

machine-learning keras spectral-analysis audio

asked Dec 22 '19 at 02:02

Nikita Jerschow

53
4

0

votes

0 answers

Benchmarking model in speech recognition with different language

My supervisor asked me to benchmark my method in classifying speech signal with other language. I am doing Malay language speech recognition. To benchmark my method/feature used, I need to test English speech. I am wondering, while doing testing…

validation pattern-recognition audio

asked Jul 02 '14 at 07:37

JASMIN

21
3

Questions tagged [audio]