I'm fitting a HMM to time series, for each data set I use BIC results to select the optimum number of states. In that, the BIC number is lowest and thereby indicating this model with that number of states best describes that data set. Is this procedure correct?
For my time series sets (around 500 time series), 2 states normally comes as best - which is most desirable as I can explain those two most easily. Around 20% BIC suggests 3 is best and a handful get 4. Another handful won't calibrate via baum-welch but that's another problem.