BIC is not going to be an effective criterion for model selection in this case. The authors of ChromHMM performed an integrative analysis of chromatin states using the ENCODE data, which found that both BIC and AIC favored models with more states than would be considered intelligible (in terms of deciphering biological significance) to humans. See Hoffman et al. (Nucleic Acids Research, 2013) for more details. If I recall correctly, in this particular case the authors chose to set the number of states to 25.
The issue is that we can't necessarily expect BIC/AIC to produce the most parsimonious model in this case because increasing the number of states (and therefore the number of parameters) results in an increased likelihood that is greater than the penalty for new parameters. It's not immediately apparent if the relatively high number of states selected by BIC/AIC reflects genuine complexity in biological chromatin states or a tendency toward overfitting. Unfortunately, the functional/biological significance of a great number of model states can only be resolved at the lab bench, and there aren't (currently) any great methods to probe the functional consequences of combinatorial chromatin configurations at scale.
You may also consider the Epigenome Roadmap Atlas Project as a source of community standards in this regard. While not fully published, the analysis group integrated ~127 epigenomes using five histone marks to fit ChromHMM models ranging from 10-25 states. In the end, the 15 state model was selected as an appropriate balance of model complexity vs. interpretability. Using additional histone marks/TFs would likely mean that the optimal trade-off between complexity and interpretability would shift to a model with an even greater number of states.
At this point, interpretability as it applies to the biological research community is a moving standard defined by relative redundancy of states when comparing the segmented genome to other gold-standard functional annotations (like TSSs, gene bodies, enhancers, etc). The logic is, of course, decidedly circular: it's quite possible that more complex models accurately describe functional categories of chromatin elements that we aren't equipped to detect because our limited library of secondary annotations provides no basis for understanding them. In other words, we only see what we already know.