Detect different fonts of audio from a single audio source

Question

My code uses Google Voice API, to detect what one person said. For example, if I say one, two, three on my microphone the Google's API returns to me you probably said: one, two, three. But when my brother and I speak at the same time at the microphone it not works. For example: I say: ' one, two, three' and my brother at the same time say: 'hello, testing, sound. The Google' API returns the words of the the speaker who spoke louder. If I say something louder than my brother it returns the words that I said, otherwise, if my brother speaks louder than me, the Google's API returns what my brother say. So I want to use an algorithm that detects all different audio fonts in a audio file, and then process each audio font using Google' API. It's not necessary to detect 'who spoke and when'. For example:

Audio File over time:

I said --> one two three

My brother said --> hello testing audio

Time in seconds:--> 1---1.5---2---2.5---3--3.5--4--4.5

So the algorithm must do the following approaches:

audio = one hello testing three audio

or

audio = one two three hello testing audio

or

my_audio = one two three
my_brother_audio = hello testing audio

And finally send this processed audio to Google' API.

How can I do this? What algorithm should I use to make it possible?

I think the question is now quite clear and can be reopened. The answer though is that ICA unfortunately won't help here (as far as I can see). — amoeba, Mar 18 '16 at 14:24
I don't think it's possible. There have been some developments in acoustics which deploy microphones which undertake the ICA task "in hardware" instead of using mathematics, however. I'll try to find a link. — Sycorax, Mar 18 '16 at 14:32
There are some ambiguities that make this question hard to follow. There is a "Cocktail Party Problem," but it would seem that any solution could be a called *a* "cocktail party algorithm." Moreover, it is not clear how relevant this problem (or any such algorithm), which concerns source separation, is to the problem of "detecting all words." Which aspects are of concern: identifying speech? Identifying individual words? Separating audio sources? — whuber, Mar 18 '16 at 14:43
I edited my question. Now, I believe my question is concise. — Carlos Porta, Mar 19 '16 at 14:32

Detect different fonts of audio from a single audio source

0 Answers0

Linked