It is said that a feature function can represent anything, the first or the last word of a sentence, a capital character and etc. But how exactly can I represent them in such a form: $F_j(x, y)$ or $\sum_i f_j(y_{i-1}, y_i, \bar x, i)$ as explained in this tutorial? Could anyone please help instantiate one in mathematical language?
2 Answers
In the simplest case, say $p(y|x,w)$ is a Gaussian distribution centered at $x$, $$p(y|x,w)=\frac{1}{Z}\exp(\langle w, F(x,y)\rangle)=\frac{1}{\sqrt{2\pi}\sigma}\exp(-\frac{(y-x)^2}{2\sigma^2})$$ we can interpret the feature function as $F(x,y)=(y-x)^2$, the parameter $w$ is then $-1/2\sigma^2$.
There's more than one way to construct the feature functions in this case, we can also let the feature function produce a vector $F(x,y)=(x^2, xy, y^2)$, then setting the parameter $w=(-1/2\sigma^2, 1/\sigma^2, -1/2\sigma^2)$ will give the same result.
In the case that $y$ is a categorical variable, consider the logistic regression, $$p(y_k|x,w)=\frac{1}{Z}\exp(\langle w, F(x,y_k)\rangle)=\frac{1}{\sum\exp(w_i\phi)}\exp(w_k\phi)$$
the feature function $F(x,y_k)$ should return a N-dimensional vector with phi being the k-th element and 0 elsewhere
so that $\langle w, F(x,y_k)\rangle=w_k\phi$.

- 13,692
- 7
- 51
- 80
In a CRF, for instance, as below, X is a set of observed variables and Y target variables:
Since the $X$ are all observed, a factor $\phi_i (X, Y_i)$ can be any function. And if it is a logistic regression function, then $X$ are just the features and so as any combination or manipulation of them, as we do in simple logistic regression tasks.
For instance(general cases) if we say, $X_1$ should be the same as $X_2$ for $Y_i$ to be a particular state $Y_c$, then we just define it as $X_1=X_2$ as one feature. If it holds the energy between this feature and $Y_c$ is low and the probability is high.

- 5,017
- 1
- 31
- 52