I know that is a duplicated question. But there is no answer for the original one and I want something more specific. On original question, the user Caaarlos wants the interpret different fonts of audio from a single audio source. But he only wants to separate words, no matter who is speaking. I want to know who is speaking, like in his question I want to separate speaker_one from speaker_two. If is not possible to detect different speakers from a single audio source. Is it possible to separate the words on a single audio source?
Asked
Active
Viewed 44 times
1 Answers
4
This is called speaker identification (if speaker is known) or diarization (if speakers are not known beforehand).
Google does not implement this feature yet, but some APIs implement it, for example Microsoft has speaker recognition API
For the description of algorithms you can read the book Fundamentals of Speaker Recognition
For the open source toolkits you can check Alize and LIUM speaker diarization.

Nikolay Shmyrev
- 540
- 2
- 11