I am a language teacher and web developer. I took an advanced course in statistics in my last year of high school, but that is many many years ago.
I am developing a web app to help learners of a foreign language hear distinctions that do not occur in their native language. The app asks them to listen to one item from a "minimal pair" of words, and to select which word they think they heard. Getting the pronunciation right can help avoid embarrassing situations.
For example:
treacle or trickle
sheep or ship
sheet or sh*t
beach or b*tch
Learners will continue to work with a particular pair of phonemes until it is clear that they can now distinguish them.
For some phoneme pairings I have more than 50 different minimal pairs of words that I can use (source).
My question is: How can my app calculate when:
- The user can distinguish between a particular pair of words (say
sheep
andship
) - The user can reliably distinguish between two phonemes (here
/iː/
and/ɪ/
) in any context ?
There is no point in continuing to work on a particular pair of words or phonemes any longer than is necessary. I want to stop proposing specific word pairs when they are no longer an interesting challenge. I want to move on to a new phoneme pair, when the user is reliably distinguishing the sounds.
The simplest answer is to let the user decide. However, I would like to calculate a confidence level, so that I can show a progress bar as feedback.
I understand that a user may give the right answer by chance half the time, so we can be confident to about 97% if they give the right answer times in a row. But they might give the wrong answer through a slip of the hand, rather than because of an inability to distinguish the sounds.
I cannot be sure that high accuracy with one pair of words means high accuracy with the phoneme pair in general. At this point, I do not have any data to check for correlations between word-pair accuracy and phoneme-pair accuracy.
What I am looking for is advice on how to calculate when enough examples have been given, for the beta version of the app, and how I could refine this calculation once sufficient data has been collected.