I have audio files that contain interviews with long periods of silence.
n - Number of interviews for a given audio file.
I need to split the audio into periods where the interviews are actually occurring. The interview is characterized by periods where there is continuously high volume levels.
Sometimes there are conversations that occur during within these audio files that are not part of the interview. For this reason, I need to select the n largest segments/chunks of high noise levels.
Other times there are sirens and loud noises that are brief.
There is always least a few minutes of low-noise levels between interviews. Once the interviews start, the noise is consistently high again for the interview.
How can I go about solving this?