When do Markov random fields $\neq$ exponential families?

Question

In their textbook, Graphical Models, Exponential Families and Variational Inference, M. Jordan and M. Wainwright discuss the connection between Exponential families and Markov Random Fields (undirected graphical models).

I am trying to understand better the relationship between them with the following questions:

Are all MRFs members of the Exponential families?
Can all members from the Exponential families be represented as an MRF?
If MRFs $\neq$ Exponential families, what are some good examples of distributions of one type not ncluded in the other ?

From what I understand in their textbook (Chapter 3), Jordan and Wainwright present the next argument:

Say we have a a scalar random variable X that follows some distribution $p$, and draw $n$ i.i.d. observations $X^1, \ldots X^n$, and we want to identify $p$.
We compute the empirical expectations of certain functions $\phi_\alpha%$

$\hat{\mu}_\alpha= \frac{1}{n}\sum^n_{i=1}\phi_\alpha(X^i), $ for all $\alpha \in \mathcal{I}$

where each $\alpha$ in some set $\mathcal{I}$ indexes a function $\phi_\alpha: \mathcal{X} \rightarrow R$
Then if we force the following two sets of quantities to be consistent, i.e. to match (to identify $p$):
- The expectations $E_p[(\phi_\alpha(X)]=\int_\mathcal{X}\phi_\alpha(x)p(x)\nu(dx)$ of the sufficient statistics $\phi$ of the distribution $p$
- The expectations under the empirical distribution

we get an underdetermined problem, in the sense that there are many distributions $p$ that are consistent with the observations. So we need a principle for choosing among them (to identify $p$).

If we use the principle of maximum entropy to remove this undeterminancy, we can get a single $p$:

$\DeclareMathOperator*{\argmax}{arg\,max} p^* = \argmax_{p\in{\mathcal{P}}} \,H(p)$ subject to $E_p[(\phi_\alpha(X)] = \hat{\mu}_\alpha$ for all $\alpha \in \mathcal{I}$

where this $p^*$ takes the form $p_\theta(x) \propto $ exp${\sum_{\alpha \in \mathcal{I}}\theta_\alpha \phi_\alpha(x)},$ where $\theta \in R^d$ represents a parameterization of the distribution in exponential family form.

In other words, if we

Make the expectations of the distributions be consistent with the expectations under the empirical distribution
Use the principle of maximum entropy to get rid of undetermination

$\rightarrow$ We end up with a a distribution of the exponential family.

However, this looks more like an argument to introduce exponential families, and (as far as I can understand) it does not describe the relationship between MRFs and exp. families. Am I missing anything?

I think there is some confusion there: [MRFs]( http://en.wikipedia.org/wiki/Markov_random_field) are not defined according to the maximum entropy principle, but on their own right, by the fact the density factorises according to the cliques of the graph. MRFs are exponential families, due to their log-linear representation. — Xi'an, Apr 14 '12 at 20:57
Thanks @Xi'an. This part "*MRFs are defined by the fact the density factorises according to the cliques of the graph*" is what I always thought defines an MRF. But why does this property make all MRFs be part of the exponential families? And what are examples (if there are any) of either type (MRFs or exp. families) that are not members the other type? — Amelio Vazquez-Reina, Apr 14 '12 at 21:13
I am not sure how much it will add for you, but one thing that may make it clearer is reading the original formulation of Gibbs distributions and MRFs in [this paper](http://www.umiacs.umd.edu/~raghuram/ENEE731/MRF/GemanPAMI84.pdf) by Geman and Geman. Basically, the whole idea is to model something with a Boltzman distribution (exp to the minus something) and then ask how the something factorizes. Because of this way of describing it, it may be more obvious their connection to exponential families. — ely, Apr 14 '12 at 21:52
Exponential families are defined by the fact that the log density is essentially a scalar product of a vectorial function of the observations and of a vectorial function of the parameters. There is no graphical structure involved in this definition. MRFs involve in addition a graph that defines the cliques, the neighbourhoods, &tc. Hence, MRFs are exponential families with an added structure, the graph. — Xi'an, Apr 15 '12 at 07:29
I guess the confusion in contradicting comments/answer comes down to whether you are allowed to introduce factors which are not loglinear with respect to their parameters. — Yaroslav Bulatov, Jul 22 '12 at 03:33

score 16 · Accepted Answer · answered Apr 29 '12 at 06:48

You are entirely correct -- the argument you presented relates the exponential family to the principle of maximum entropy, but doesn't have anything to do with MRFs.

To address your three initial questions:

Can all members from the Exponential families be represented as an MRF?

Yes. In fact, any density or mass function can be represented as an MRF! According to Wikipedia [1], an MRF is defined as a set of random variables that are Markov with respect to an undirected graph. Equivalently, the joint distribution of the variables can be written with the following factorization: $$P(X=x) = \prod_{C \in cl(G)} \phi_C(X_C = x_C)$$ where $cl(G)$ is the set of maximal cliques in $G$. From this definition you can see that a fully connected graph, while completely uninformative, is consistent with any distribution.

Are all MRFs members of the Exponential families?

No. Since all distributions can be represented as MRFs (and not all distributions belong to the exponential family) there must be some "MRF members" that are not exponential family members. Still, this is a perfectly natural question -- it seems like the vast majority of MRFs people use in practice $are$ exponential family distributions. All finite-domain discrete MRFs and Gaussian MRFs are members of the exponential family. In fact, since products of exponential family distributions are also in the exponential family, the joint distribution of any MRF in which every potential function has the form of an (unnormalized) exponential family member will itself be in the exponential family.

If MRFs $\neq$ Exponential families, what are some good examples of distributions of one type not included in the other?

Mixture distributions are common examples of non-exponential family distributions. Consider the linear Gaussian state space model (like a hidden Markov model, but with continuous hidden states and Gaussian transition and emission distributions). If you replace the transition kernel with a mixture of Gaussians, the resulting distribution is no longer in the exponential family (but it still retains the rich conditional independence structure characteristic of practical graphical models).

[1] http://en.wikipedia.org/wiki/Markov_random_field

When do Markov random fields $\neq$ exponential families?

1 Answers1

Linked