Questions tagged [audio]

12 questions
2
votes
0 answers

goodness of fit for psychometric data (perceptual threshold)

I'm running an experiment on perceptual thresholds in audio. I'll try not to bog you down with too many details: The experiment is about vibrato speed; specifically, when can you tell the difference between two stimuli that differ in vibrato speed…
2
votes
1 answer

What are good basic loss functions for audio generation? (TTS)

I'm planning to make an audio generation NN. While I'm reasonably ok with neural networks in general, wavenets, etc., something is not quite clear. What are good loss functions for audio, considering the points below? Target data may have variable…
2
votes
1 answer

Different fonts of audio from a single audio source

I know that is a duplicated question. But there is no answer for the original one and I want something more specific. On original question, the user Caaarlos wants the interpret different fonts of audio from a single audio source. But he only wants…
1
vote
1 answer

Looking for repeated patterns in time series data

I have spent the best part of the last few days searching forums and reading papers trying to solve the following question. I have thousands of time series arrays each of varying lengths containing a single column vector. this column vector contains…
1
vote
0 answers

Why almost all neural speech processing involves Mel Spectrograms?

What are the reasons behind almost all speech processing whether it be generative or recognition heavily based on Mel Spectrograms? In a conversation with a signal processing expert I was asked why most ML systems in speech processing domain work…
Rijul Gupta
  • 111
  • 2
1
vote
0 answers

What is the name of this data denoising method

I've been working on extracting data from an extremely noisy signal. The signal itself is the 1st derivative of raw mean squared (RMS) of an audio that may contain segments with some single low frequency (LF). The RMS window size I'm using is…
DSPGuy
  • 11
  • 2
1
vote
0 answers

WGAN-GP stability loss

I am training a Conditional WaveGAN (1D DCGAN for audio) using WGAN-GP whose generator is of an auotencoder architecture. The network is trained to take an audio input, compress it, then decompress it into it's original waveform. I achieved…
Harry Stuart
  • 219
  • 1
  • 6
0
votes
0 answers

speaker recognition: training on enrollment data

I'm working on a speaker recognition challenge. I have already trained my model on the voxceleb2 dataset in triplet setup. Now, for the challenge, I have two sets. enrollment (1 audio/subject) [IDs given] test (random number of audios without…
0
votes
0 answers

Semi-supervised VS Self-taught learning

I want to build a Speaker Identification model and I am wondering what is the best for the feature extracting step: Using unlabeled examples from the same distribution as labeled ones (we can use the labeled data after ignoring the labels). Using…
0
votes
0 answers

Is it a good practice to pad signal before feature extraction?

I have a question for you - is padding, before feature extraction with VGGish, a good practice? Our padding technique is to find the longest signal (which is loaded .wav signal) and then in every shorter signal put zeros to the size of the longest…
0
votes
1 answer

Conv2D Kernel size for audio-related tasks

So I've been working on this audio-rec task for a while now, and I've had some good luck using 2D convolutions on the spectrogram of audio (I've also tried Mel-spectrograms, the difference is minor in my opinion). Up until now I've been using this…
0
votes
0 answers

Benchmarking model in speech recognition with different language

My supervisor asked me to benchmark my method in classifying speech signal with other language. I am doing Malay language speech recognition. To benchmark my method/feature used, I need to test English speech. I am wondering, while doing testing…
JASMIN
  • 21
  • 3