Different fonts of audio from a single audio source

Question

I know that is a duplicated question. But there is no answer for the original one and I want something more specific. On original question, the user Caaarlos wants the interpret different fonts of audio from a single audio source. But he only wants to separate words, no matter who is speaking. I want to know who is speaking, like in his question I want to separate speaker_one from speaker_two. If is not possible to detect different speakers from a single audio source. Is it possible to separate the words on a single audio source?

score 4 · Accepted Answer · answered Apr 20 '16 at 00:32

This is called speaker identification (if speaker is known) or diarization (if speakers are not known beforehand).

Google does not implement this feature yet, but some APIs implement it, for example Microsoft has speaker recognition API

For the description of algorithms you can read the book Fundamentals of Speaker Recognition

For the open source toolkits you can check Alize and LIUM speaker diarization.

Different fonts of audio from a single audio source

1 Answers1