I have a time series data set to which I am trying to fit a Hidden Markov Model (HMM) in order to estimate the number of latent states in the data. My pseudo code for doing this is the following:
for( i in 2 : max_number_of_states ){
...
calculate HMM with i states
...
optimal_number_of_states = "model with smallest BIC"
...
}
Now, in the usual regression models the BIC tends to favor the most parsimonious models but in the case of the HMM I am not sure that is what it is doing. Does anyone actually know what kind of HMM's the BIC criterion tends toward? I also am able to obtain the AIC and likelihood value as well. Since I am trying to infer the true total number of states, is one of these criteria "better" than the other for this purpose?