I have a corpus of short speech samples from Kiswahili speakers, and I want to detect the number of syllables in each audio recording. How should I approach this task?
Background: I asked a conceptual/programming variant of this question on StackOverflow showing my attempt at peak detection in R. Users migrated the question to CrossValidated, but someone suggested that SP might be a better home. I asked how to handle the situation on SP Meta, and a user gave me helpful advice to frame this question more conceptually on SP.