1

I have audio files that contain interviews with long periods of silence.

n - Number of interviews for a given audio file.

I need to split the audio into periods where the interviews are actually occurring. The interview is characterized by periods where there is continuously high volume levels.

Sometimes there are conversations that occur during within these audio files that are not part of the interview. For this reason, I need to select the n largest segments/chunks of high noise levels.

Other times there are sirens and loud noises that are brief.

There is always least a few minutes of low-noise levels between interviews. Once the interviews start, the noise is consistently high again for the interview.

How can I go about solving this?

Has QUIT--Anony-Mousse
  • 39,639
  • 7
  • 61
  • 96
Bijan
  • 111
  • 3

1 Answers1

1

First, identify noise.

Then you can measure the length of noise and non-noise periods.

It would make sense to treat this as an optimization problem. Define a cost function for joining neighboring non-noise segments with the noise inbetween. Then optimize this.

For example, you could define the cost as "total length after joining - 10 * length of noise within". This prefers long intervals of talking.

Has QUIT--Anony-Mousse
  • 39,639
  • 7
  • 61
  • 96
  • Do you think you could define this terms in more detail? I am not sure what you mean by 'join' and 'length' in these contexts. Especially in your example. Thank you. – Bijan Apr 24 '16 at 03:19
  • You would first cut the file into too fine parts, then rejoin them. 120 seconds talking 3 seconds noise 120 seconds talking is a good candidate to join, valued e.g. (120 - 10*3 + 120 = 210) – Has QUIT--Anony-Mousse Apr 24 '16 at 06:38