I am new to ML and I'm having trouble figuring out how to implement a Maximum Entropy Markov Model for a sequence labeling task. Given this MEMM equation $$ P_{s'}(s|,o)=\frac{1}{Z(o,s')}\exp\left(\sum_{a}\lambda_{a}f_{a}(o,s)\right) $$ Where $s'$ is the previous classification, $s$ is the current classification, and $o$ is the current observaton.
And the following data
+--------+-----+
| The | DT |
| book | NN |
| I | PRP |
| read | V |
| was | V |
| pretty | JJ |
| good | JJ |
| . | . |
| I | PRP |
| told | V |
| him | PRP |
| to | TO |
| book | V |
| me | PRP |
| a | DT |
| flight | NN |
| . | . |
| Could | MD |
| I | PRP |
| borrow | V |
| your | PRP |
| book | NN |
| ? | ? |
+--------+-----+
What are the steps involved for calculating the $Z$ factor of $P(NN|\text{book}, DT)$?
$$ Z=\sum_{C}p(c|x)=\sum_{c'\in C}\exp\left(\sum_{i=0}^{N}w_{c'i}f_{i}\right) $$
Assume all feature weights to be 1 and all feature functions to be binary indicators of the presence of both strings.
My confusion comes from the first series symbol in the $Z$ factor equation, which appears to ask us to take the sum of the exponents of all potential previous classifications and not just the particular one provided.