Inferring a Markov chain from its invariant measure

Question

Given a probability measure $p$ on $\{1,\dots,n\}$ assumed to be the invariant measure of some irreducible ergodic Markov chain with unknown transition matrix $P$, i.e., $p = pP$, what (if any) problems about inferring a unique/optimal $P$ have been considered?

Examples of side conditions that might be in the literature somewhere to adequately specify a solution: requiring $P$ to be as sparse as possible along with some other condition like a few specific entries are known, or the sparsity pattern itself is completely known, or that $P$ is the exponential of some sparse (obviously embeddable) generator matrix, etc.

NB. I am aware that there are many stochastic matrices with a given invariant measure, that the invertible ones form a Lie semigroup, etc. Generic answers along these lines (i.e., that do not have some pointer to the literature or an actual special case that's been considered or solved or some such) are not helpful.

Welcome to our site. Generic questions like this one invite generic answers, so if you could edit your post to make it more specific or focused on a particular problem, that would help you get specific answers. — whuber, Aug 23 '18 at 19:14
Interesting question, but I'm having trouble with motivation for it. MCMC is a classic example of constructing a Markov chain that converges to a known $p$. Often, the transition matrix is *nearly* sparse, in that a few entries tend to be way bigger than the rest. Yet, even if the matrix was sparse, it's hard to think of what advantage it would give. Doing iterations and finding eigenvalues (to infer convergence speed) would be faster. Yet, there's no reason for a sparse matrix to converge *faster* to the stationary distribution, which would likely be the whole point. — Alex R., Aug 31 '18 at 08:54
@AlexR.Here's one motivation: suppose $p$ is the concentration of metabolites and $P$ encodes metabolic pathways. You know a few of the reactions but not all, and sparsity is a reasonable Ansatz. More generally, suppose you can justify an Ansatz for sparse Markovian dynamics of some system but you can only measure the measure, and you want to know what the dynamics are. — S Huntsman, Aug 31 '18 at 12:30

score 1 · Answer 1 · answered Nov 28 '20 at 02:37

Some thoughts:

In some sense, the problem is only worth thinking about if $N$ is very large, otherwise one can sample directly from $p$.
In the case that $N$ is very large, one ought to make some structural assumptions on the state space which can limit the complexity of the Markov chain.
Commonly, these sampling tasks come imbued with some graphical structure, e.g. $[ N ]$ indexes the vertices of a graph, and the number of edges in that graph is not too large. A common assumption is that the underlying graph is sparse, i.e. each node in the graph has much fewer than $N$ vertices.

Restricting to the setting of finding an optimal Markov chain on a graph which respects the graphical structure (i.e. one can only move from neighbour to neighbour), there has been some work (e.g. 1, 2) in the restricted setting of Markov chains with the uniform distribution as invariant measure. While one could probably extend the methodology to account for non-uniform measures, my sense is that this is not really a way of designing sampling algorithms, as for really large state spaces, it is unlikely that this problem can be solved efficiently.

A simpler case which has been analysed is: given a base Markov kernel $K$ with some invariant measure, find the simplest modification (in a precise sense) of $K$ which has $p$ as invariant measure. In 3, the authors show that by defining

$$d (K_1, K_2 ) = \sum_{i = 1}^N \sum_{j \neq i} p(i) | K_1 (i, j) - K_2 (i, j)|$$

then among the $K_2$ which are reversible with respect to $p$ (a stronger condition than leaving $p$ invariant), the $K_2$ which is closest to $K_1$ in $d$-distance is given by the 'Metropolisation' of $K_1$, i.e.

$$K_2 (i, j) = K_1 (i, j) \cdot \min \left(1, \frac{p(j) K_1 (j, i)}{p(i) K_1 (i, j)} \right) \quad \text{for } j \neq i,$$

with $K_2(i, i)$ defined such that $\sum_j K_2 (i, j) = 1$ for all $i$.

A benefit of this result is that $K_2$ is immediately and efficiently implementable if i) $K_1$ can be sampled from, and ii) $K_1$ can be evaluated pointwise. Moreover, if $K_1$ is defined to respect a given graph structure, then $K_2$ will respect that same graph structure.

Inferring a Markov chain from its invariant measure

1 Answers1