3

I have a set $E_{1}$, with a finite cardinality $n$ of rectangular matrices which contains the useful MFCC coefficients generated from $n$ speech signals. Similary I have a set $E_{2}$ of same cardinality as that of $E_{1}$ which is a collection of vectors of finite dimension containing the LPC of the same set of speech signal which was used to form $E_{1}$. Now $ D=\{ E_{1},E_{2} \}$ forms the database for the speaker recognition system.

When a test signal is given, MFCC $M_{i}$ and LPC $L_{i}$ are generated and the closest members $M_{j} \in E_{1}$ for MFCC and $L_{j} \in E_{2}$ for LPC are found using a distance function $d$. It is not necessary that $M_{j}$ and $L_{j}$ are the exact members of $E_{1}$ and $E_{2}$ respectively. It depends on the acoustic environment during the test phase.

  • What is the distance function used in literature?
  • If its $L_{2}$ norm, is there any other better measure which is "sensitive" so I can reduce the possibility of misclassification?
jonsca
  • 1,790
  • 3
  • 20
  • 30
Dinesh
  • 71
  • 5
  • Usually different distances like log-spectral distance, etc. are used. But my suggestion is to use DTW kind of algorithm for this scenario. – talk2speech Mar 02 '18 at 09:19

0 Answers0