2

Is there any common practice for detecting a new class, or data associated with an previously unseen event?

I'm doing some research into speech recognition, and I'm trying to detect when a speech recognizer encounters a speaker it hasn't seen before. I'm able to segment speakers using the UIS-RNN algorithm published by Google, but when I try parse the segments and recognize who's speaking, occasionally I'll run into a speaker that hasn't been tagged with an identity. I'm not finding any good way to detect this event.

My current process is to build an SVC classifier, trained on the MFCC features of each speaker's audio. Given a new segment of audio, this gives me a probability breakdown of which speaker class the audio's likely to belong to. I'll then also run the initial training set through the classifier, and get the probability for each training sample, to get a sense of the classifier's own mean and standard deviation error.

Then, when classify a new audio segment and get it's probability, I determine if a new speaker has been encountered if its probability is outside the range set by that label's mean plus/minus the standard deviation of error.

This method works ok, but not great. On some toy sample data, it works about 80% of the time, and obviously would perform worse for larger and more noisy data.

I'm having trouble researching this task, since I don't know what it's formal name is. I'm assuming I'm not the first person to want to do this. What's this process of detecting new speakers/classes called, and is there a better technique?

Cerin
  • 644
  • 7
  • 16

1 Answers1

0

This problem is usually called Novelty Detection, or Out of Distribution detection. Many of the techniques for Anomaly Detection can also be applied.

Autoencoders are a very popular family of methods that work well for these kind of problems. Autoencoders consists of an Encoder followers by a Decoder, usually each some sort of neural network. The network can be feed-forward, convolutional or recurrent, depending on what suits the task best. The layers of the Encoder gradually compresses, and then the Decoder expands it back to the input dimensions again. This learns an efficient data-dependent compression, which reconstructs samples those to the training set well, and samples that are different more poorly.

For Novelty/Anomaly Detection one calculates the reconstruction error for new samples. If this above a certain threshold, then the sample is considered novel/anomalous. This threshold can be set by observing the histogram on the training data.

Jon Nordby
  • 1,194
  • 5
  • 19