1

Referring to already discussed question, I solve something more difficult. During the cross validation, I obtain say $n$ models. The discussed question assumes that the best way is to train a new classifier from all available data at the end.

My challenge is that I use EM algorithm that can result in different results because of the random initialization. For the CV criterion, I can easily run the learning say 10 times and select for each block the best possible model. In this way, I can quantify the CV: $$ CV = \frac{1}{m}\sum_{k=1}^m \min_{l=1,\dots n} e_{k,l} $$ where $e_{k,l}$ is the error for $k$th block and $i$th trial of EM.

My question: How to determine the final classifier?

My attempt: to use the whole data to train the final classifier. Try to run the EM e.g. 10 times and use the best one in terms of the overall error on the whole data set. My concern is that this new model can be over-fitted.

Thank you for any help.

Karel Macek
  • 2,463
  • 11
  • 23

0 Answers0