Using handwriting recognition as an example, we can train various models to recognise individual characters but to actually be useful we must incorporate prior knowledge of common character sequences, words and word sequences. How is this generally done?
I have a rough understanding of Hidden Markov Models, but doesn't the 'Markov property' say that the next letter only depends on the current letter rather than (say) the last 3 letters? What if I want to incorporate common patterns of letters within words and also common patterns of words within sentences, in the same algorithm?
I know this is a fairly broad question, but I have not been able to find a good overview of how this is done. I would appreciate any references for further reading.