1

I was going over my old NLP course slides and one of the pages is about using Structured Perceptron for tagging. It claims that because the algorithm is discriminative, it doesn't care about modeling the probability of the language, and thus "every model feature should involve at least one tag."

I can understand that neither discriminative classifier nor discriminative tagger cares about modeling the probability of the language. That’s how we discern “discriminative” from “generative.” Also, I can understand that “every model feature should involve at least one tag” because that emphasizes the tag sequence probability instead of anything more related to the language probability.

My question is:

Why are we still almost exclusively using language features (unigram, bigram, word capitalization, etc.) when we feature engineer the discriminative classifier (e.g. SVM)? Why didn’t we say “every model feature in a discriminative classifier should involve the class category y?”

Yan Yang
  • 111
  • 4

0 Answers0