I am looking for pointers (papers, algorithms etc) for learning models for sequence tagging but which allow for additional structure. Consider Part of Speech Tagging, I could train a CRF which would work fine, but suppose I want to impose an additional requirement on my sequence of tags, for example, I know that there is only one Noun per sentence.
Is my only option to add a factor connecting all my hidden states and treating it as a general inference problem in graphical models?