5

Let $(X_n)_{n\in\mathbb N_0}$ denote a $\mathbb R^d$-valued Markov chain generated by the Metropolis-Hastings algorithm. Suppose I've run the algorithm on a computer and obtained a sample $x_0,\ldots,x_n$.

How can I compute the (truncated) autocorrelation of this sample?

I'm a bit unsure which quantity the autocorrelation actually is. What I really want to do is computing $\operatorname{eff}_{\overline\theta}$ from the paper Efficient Metropolis Jumping Rules (equation $(1)$ on page 600).

Let $$A_n:=\frac1n\sum_{i=0}^{n-1}X_i\;\;\;\text{for }n\in\mathbb N$$ and $$\tau_n:=\frac12+\frac1{\operatorname{Var}[X_0]}\sum_{k=1}^{n-1}\left(1-\frac kn\right)\operatorname{Cov}[X_0,X_k]\;\;\;\text{for }n\in\mathbb N.$$ I know that $$\operatorname{Var}[A_n]=\frac{\operatorname{Var}[X_0]}n2\tau_n\;\;\;\text{for all }n\in\mathbb N\tag1$$ and $$n\operatorname{Var}[A_n]\xrightarrow{n\to\infty}\operatorname{Var}[X_0]+2\sum_{k=1}^\infty\operatorname{Cov}[X_0,X_k]\tag2.$$

Maybe the quantity corresponding to equation $(1)$ from the paper is a truncated version from the right-hand side of $(2)$? In any case, how can we compute the right quantity for the sample $x_0,\ldots,x_n$ (in a numerically stable way)?

EDIT: I need to implement the computation in C++, but a pseudocode for the computation would be sufficient for me.

0xbadf00d
  • 539
  • 3
  • 8

1 Answers1

3

There are two aspects to your question: (1) the link between the definition of $\tau_n$ and equation 2, and (2) how to compute the asymptotic variance. I won't give a formal proof, but will try to give the intuition.

First, note that if the chain is initialized from the stationary distribution (or, alternatively, if you have removed warm-up/burn-in), then the $X_0$ and $X_k$ are identically distributed and you can rewrite $\tau_n$ using correlations instead of covariances:

$$\tau_n=\frac12+\sum_{k=1}^{n-1}\left(1-\frac kn\right)\operatorname{Cor}[X_0,X_k]\;\;\;\text{for }n\in\mathbb N.$$

Let us assume that you have run your chain for very large $n$, so that your chain has mixed well. For $k$ large enough, $\operatorname{Cor}[X_0,X_k]\approx 0$ (the Markov chain "forgets its past"), so you can truncate the series

$$\tau_n\approx\frac12+\sum_{k=1}^{K}\left(1-\frac kn\right)\operatorname{Cor}[X_0,X_k]$$

for some $K<<n$. But for the remaining terms in the series, $\frac kn <<1$ so that

$$\tau_n\approx\frac12+\sum_{k=1}^{K}\operatorname{Cor}[X_0,X_k]$$

which corresponds to the truncated version of the series in equation 2.

Finally, note that once the chain has reached stationarity, $\operatorname{Cor}[X_0,X_k] = \operatorname{Cor}[X_t,X_{t+k}]$ for all $t$. You have therefore an unbiased estimator of $\operatorname{Cor}[X_0,X_k]$: the empirical correlation between the vectors $(X_0, X_1, \ldots, X_{n-k})$ and $(X_k, X_{k+1}, \ldots, X_n)$. For $k<<n$, the size of these vectors is large enough that the estimate will be numerically stable.

The numerical computation is now easy as long as you choose $K$ appropriately. I think it is standard to choose $K \approx \arg\min_k \left\{\hat{Cor}[X_0,X_k]\leq 0\right\}$.

For instance, here is a plot from an MCMC run, obtained thanks to the R function acf; it shows $\operatorname{Cor}[X_t,X_{t+k}]$ against $k$. In this plot, I would truncate the sum at $K=100$ (and would usually only display the plot up to that point. In the code below, the first line draws the plot and the second computes $\tau$. (The $-0.5$ term is because acf returns the series starting at $k=0$ instead of $k=1$, so you need to substract $\frac12$ instead of adding it.)

acf(X, lag.max=500) 
tau = sum(acf(X, lag.max=100, plot=F)$acf) - 0.5

enter image description here

Robin Ryder
  • 1,787
  • 1
  • 11
  • 16
  • Thank you for your answer. (a) Why $\operatorname{Cor}[X_0,X_k]\approx 0$ for large $k$? (b) How do you numerically compute $\operatorname{Cor}[X_0,X_k]$? (c) In your plot, which target and proposal did you choose? – 0xbadf00d Mar 02 '19 at 22:01
  • I have updated the answer. (a) for large $k$, the chain has "forgotten its past", so the correlation goes to 0; (b) I have added the R code to the answer; (c) the plot corresponds to the estimate of a regression coefficient in a probit model, estimated using Metropolis-Hastings with a gaussian kernel. But the method is not specific to this example. – Robin Ryder Mar 03 '19 at 00:02
  • Unfortunately, the R code doesn't help me much since I need to implement it with C++. Could provide some pseudocode how $\operatorname{Cor}[X_0,X_k]$ is usually computed (in a stable way)? – 0xbadf00d Mar 03 '19 at 10:54
  • I think the argument you need is that $\forall t, Cor[X_0, X_k]=Cor[X_t, X_{t+k}]$ (once the chain has reached stationarity). From this, you can derive an unbiased estimate of $Cor[X_0, X_k]$ based on a sample of size $n-k$, which will be numerically stable. – Robin Ryder Mar 03 '19 at 20:55
  • @Taylor I'm still unsure how I need to compute the autocorrelation. Suppose `x[0], ..., x[n - 1]` are my Markov chain samples and let `mean` denote their mean. Should I then simply compute `s = 0; for (i = 1; i < n; ++i) { s += (x[0] - mean) * (x[i] - mean); }` and then return `1 / (1 + 2 * s)`? Is this what is done for Table 1 of the paper? – 0xbadf00d Mar 04 '19 at 22:21
  • @Taylor Thank you for the source file, it is clear to me now. In the paper of Roberts, he's writing that he used "the method of batching" for the estimation of $\operatorname{eff}_{\overline\theta_i}$ in Table 1. Could you explain to me what's actually meaning? By the code you've provided, we can compute the autocorrelation of a one-dimensional set of samples at any lag. How do we estimate $\operatorname{eff}_{\overline\theta_i}$ from that? – 0xbadf00d Mar 12 '19 at 21:49
  • @0xbadf00d can you put that into a separate question? I’d have more room to take a stab at it, and I am certain that there are many others on this site that are very qualified to answer a question like that as well. – Taylor Mar 12 '19 at 21:51
  • @Taylor Done: https://stats.stackexchange.com/q/397263/222528 – 0xbadf00d Mar 13 '19 at 11:21