12

The background of my study:

In a Gibbs sampling where we sample $X$ (the variable of interests) and $Y$ from $P(X|Y)$ and $P(Y|X)$ respectively, where $X$ and $Y$ are $k$-dimensional random vectors. We know that the process is usually split into two stages:

  1. Burn-in Period, where we discard all the samples. Denote the samples as $X_1\sim X_t$ and $Y_1\sim Y_t$.
  2. "After-Burn-in" Period, where we average the samples $\bar{X} = \frac{1}{k}\sum_{i=1}^k X_{t+i}$ as our final desired result.

However, the samples in the "after-burn-in" sequence $X_{t+1}\sim X_{t+k}$ are not independently distributed. Therefore if I want to inspect the variance of the final result, it becomes

$$\operatorname{Var}[\bar{X}] = \operatorname{Var}\left[\sum_{i=1}^k X_{t+i}\right] = \frac{1}{k^2}\left(\sum_{i=1}^k\operatorname{Var}[X_{t+i}] + \sum_{i=1}^{k-1} \sum_{j=i+1}^k \operatorname{Cov}[X_{t+i},X_{t+j}]\right)$$

Here the term $\operatorname{Cov}[X_{t+i},X_{t+j}]$ is a $k\times k$ cross-covariance matrix applies to any $(i,j)$ with $i<j$.

For example, I have

$$X_{t+1} = (1,2,1)'\\ X_{t+2} = (1,0,2)'\\ X_{t+3} = (1,0,0)'\\ X_{t+4} = (5,0,-1)'$$

then I could estimate the covariance matrix $\operatorname{Cov}[X_{t+i}, X_{t+i+1}]$ with

$$ \frac{1}{3}\sum_{i=1}^3 (X_{t+i}-\mu_{t+i})(X_{t+i+1}-\mu_{t+i+1})' $$

Now I am interested in if the resulting estimation is significantly non-zero so that I need to include it into my variance estimation of $\operatorname{Var}[\bar{X}]$.

So here comes my questions:

  1. We sample $X_{t+i}$ from $P(X_{t+i}|Y_{t+i})$. Since $Y_{t+i}$ is changing, I think $X_{t+i}$ and $X_{t+i+1}$ are not from the same distribution, so $\operatorname{Cov}[X_{t+i},X_{t+j}]$ is not the same as $\operatorname{Cov}[X_{t+i},X_{t+i}]$. Is this statement correct?
  2. Suppose I have enough data to estimate $\operatorname{Cov}[X_{t+i},X_{t+i+1}]$ (neighboring samples in the sequence), is there any way to test if the covariance matrix is significantly a non-zero matrix? Broadly speaking, I am interested in an indicator which guides me to some meaningful cross-covariance matrices that should be included in my final variance estimation.
Chill2Macht
  • 5,639
  • 4
  • 25
  • 51
TomHall
  • 353
  • 2
  • 12
  • If the population covariance matrix is entirely zero (including the diagonals), the sample covariances must all be zero. – Glen_b Jan 06 '16 at 03:12
  • @Glen_b Is that also true for $Cov[X,Y] = E[XY]-E[X]E[Y]$? i.e. for covariance between two different variables. In my question it is for $X_{t+i}$ and $X_{t+i+1}$. – TomHall Jan 06 '16 at 19:16
  • Hi Tom -- how does that change anything? $X_r$ and $X_s$ (for $s\neq r$) are "two different variables"; the process observed at two different times. – Glen_b Jan 06 '16 at 23:57
  • Hi @Glen_b, thank you for the comments. May I raise another question here? For $s\neq r$, we sample $X_s$ and $X_r$ from $P(X_s|Y_s)$ and $P(X_r|Y_r)$ where $Y$ is another random variable. Does it mean that $X_s$ and $X_r$ are differently distributed? – TomHall Jan 07 '16 at 02:15
  • Tell you what -- add that to your question, so it's not longer something with a one-sentence answer, and I'll try to answer both. – Glen_b Jan 07 '16 at 02:18
  • Are you talking about covariance matrix or rather about a *cross-covariance* matrix? Perhaps I don't fully understand the notation: is $X_{t+i}$ a number or a vector? What's the size of your $\mathrm{Cov}[X_{t+i}, X_{t+i+1}]$? What are its diagonal elements, are these variances or covariances? If it's cross-covariance matrix, then what @Glen_b said above about being entirely zero does not apply. – amoeba Jan 07 '16 at 10:27
  • @amoeba -- that's a good point; if it's not a full covariance matrix (as I assumed) then the diagonals won't be variances. That would invalidate my comment; it needs to be made clear. – Glen_b Jan 07 '16 at 12:45
  • Hi @amoeba, I am sorry for the ambiguity in the question. Some details are added. – TomHall Jan 07 '16 at 22:59
  • 4
    Actually, now this looks like quite a good question; I think some other people will be better placed to give good answers than me, so I'd like to promote this (place a bounty on it) when it becomes eligible shortly. [Short answers: 1. Those two covariances are different. 2. You don't need to *test* whether consecutive variates are correlated (in all but the most trivial cases they are; the algorithm works by generating dependent variates) -- more interesting to measure the correlation than test it;] ... if good answers don't show up I'll expand those short comments into a full answer – Glen_b Jan 07 '16 at 23:30
  • 4
    It seems that your question is much broader than your title question. Specifically addressing your title question, there is Bartlett's test of sphericity that allows to test if a sample covariance matrix is diagonal. You probably would need to adapt it to your cross-covariance scenario (your "covariance matrix" is actually not really covariance matrix, it's a cross-covariance matrix; it's an off-diagonal block of the full covariance matrix of both X_t and X_{t+1} together). CC to @Glen_b. – amoeba Jan 07 '16 at 23:44
  • @Glen_b Thanks, I will think about your short answers, and probably put a bounty on it ;) – TomHall Jan 07 '16 at 23:57
  • @amoeba I just learnt the term "cross-covariance" from you :) Thanks for your suggestion. I will try to expand it but not sure if I can make it. – TomHall Jan 08 '16 at 00:00
  • 2
    I would add that covariances tend to decay more or less geometrically (increasingly so as you move further apart); values far apart in time tend to have very low correlation (*not* zero but largely ignorable) while those close together can sometimes be quite dependent. – Glen_b Jan 08 '16 at 00:11
  • @Glen_b I observed similar behaviour in my simulation. However in my limited experiences in time series analysis, the "autocorrelation" could happen, say, at lag 1 and 4 ( but jump over lag 2 and 3). I feel it is kind of similar to the autocorrelation scenario. – TomHall Jan 08 '16 at 00:20
  • 1
    @Tom 1. Nevertheless, with stationary series, at very distant lags (4 is not distant!), what happens to the ACF? $\quad$ 2. You know something about how generated values from MCMC works that you can't say about arbitrary time series ... they're [*Markovian*](https://en.wikipedia.org/wiki/Markov_property). You'll note that my earlier comments don't claim that the closest lags must show geometric decay (e.g. I didn't say it was impossible to see a higher correlation at lag 4 than 3). You will still get (if certain conditions hold) tendency to geometric decay in the ACF as you move far apart. – Glen_b Jan 08 '16 at 00:37
  • @Glen_b I totally agree with you. Besides in my case, the longer the lag, the fewer data I could use to make the estimation, thus I have to restrict the number of $(i,j)$ pairs to consider. – TomHall Jan 08 '16 at 00:56
  • Uh... how small is your $k$ (or how big are your early correlations) that this could be an issue? e.g. consider an AR(1) (often - but not always - a reasonable approximation for MCMC) ... with $\rho = 0.95$ and a lag of 100... the correlation is only about 0.005 at that lag. ... Oh, wait, are you trying to figure out a *skip* interval? – Glen_b Jan 08 '16 at 01:23
  • @Glen_b Oh, I am not trying to figure out a skip interval, my purpose remains to estimate the variance reasonably. In my scenario the after-burn-in period might between 50~100, and a reasonable maximum lag to consider might be less than 20. I just feel estimating a covariance matrix needs more data so I don't want to look into long lags. – TomHall Jan 08 '16 at 02:17
  • Why such a short sampling period? (I'd more usually take a hundred times that length. Occasionally as few as 20 times that.) – Glen_b Jan 08 '16 at 03:59
  • @Glen_b It could be longer, the reason is half efficiency half "good-enough". The current result is useful enough so I would rather consider how to include the cross-covariance than speeding up my code. – TomHall Jan 08 '16 at 04:04
  • 2
    If your sampling period is so short you don't have highly accurate estimates of the cross covariance, you may just have to deal with the fact that your estimates of the cross-covariance terms have largish standard error. Given my current understanding I'm even more strongly going to reaffirm my objection to testing the correlations. Hypothesis testing for zero vs non-zero correlations doesn't address your problem here. – Glen_b Jan 08 '16 at 04:08
  • @Glen_b May I know more details about "test the correlation"? Or I could state in this way: I am interested in an indicator which guides me to some _meaningful_ cross-covariance matrices. – TomHall Jan 08 '16 at 04:22
  • 1
    Your problem is one of *estimation*, not testing (you know the null hypothesis is false!). – Glen_b Jan 08 '16 at 04:23
  • @Glen_b Can I say that for $Cov[X_1, X_{1000}]$ I am actually not sure about the null hypothesis? The estimation accuracy might be ideal as I speed up my code and run for thousands iterations. – TomHall Jan 08 '16 at 04:26
  • Let us [continue this discussion in chat](http://chat.stackexchange.com/rooms/34002/discussion-between-glen-b-and-tomhall). – Glen_b Jan 08 '16 at 04:34

1 Answers1

1
  1. We sample $X_{t+i}$ from $P(X_{t+i}|Y_{t+i})$. Since $Y_{t+i}$ is changing, I think $X_{t+i}$ and $X_{t+i+1}$ are not from the same distribution [...]

You are confusing conditional and unconditional distributions here, see also my next remark. Conditional on $Y_{t+i} = y_1$ and $Y_{t+i+1} = y_2$, $P(X_{t+i}|Y_{t+i} = y_1) \neq P(X_{t+i+1}|Y_{t+i+1} = y_2)$. But the entire point of constructing your Gibbs sampler is to sample from the stationary distributions of $X$ and $Y$. Very roughly speaking, if you have run your chain for long enough and so that $\{Y_t\}$ follows the stationary distribution, you can then say \begin{align} P(X_t) = \int_{\mathcal{Y}}P(X_t|Y_t)dP(Y_t), \end{align} meaning that the unconditional distribution of $X_t$ is also invariant. In other words, as $t \to \infty$ and we converge to the stationary distributions, $P(X_{t+i}|Y_{t+i}) = P(X_{t+i+1}|Y_{t+i+1})$, since $Y_{t+i}$ and $Y_{t+i+1}$ will asymptotically be drawn from (the same!) stationary distribution $P(Y_t)$. On the other hand and as before, once we condition on $Y_{t+i} = y_1$ and $Y_{t+i+1} = y_2$, this won't hold anymore, regardless how large $t$ is.

[...] so $\operatorname{Cov}[X_{t+i},X_{t+j}]$ is not the same as $\operatorname{Cov}[X_{t+i},X_{t+i}]$. Is this statement correct?

Yes, this is correct - even though $X_{t+1} \sim X_{t}$, i.e. $X_t$ and $X_{t+1}$ have the same stationary distribution. I know this may be confusing, but bear with me. Define $Y_t = 0.8\cdot Y_{t-1} + \varepsilon_t$ with $\varepsilon_t \overset{iid}{\sim} N(0,1)$. By iterated substitution, one can show that $Y_t = \sum_{i=0}^t0.8^i \varepsilon_{t-i}$, and since (infinite) sums of normals are still normal, it holds that $\text{Var}(Y_t) = \sum_{i=0}^t0.8^{2i} = \dfrac{1}{1-0.8^2}$ and so that $Y_t \overset{iid}{\sim} N(0, \dfrac{1}{1-0.8^2})$. Clearly, $Y_t$ and $Y_{t+1}$ will still be correlated, but they will also come from the same distribution ($Y_{t+1} \sim Y_{t}$). A similar situation holds for your $X_t$.

  1. Suppose I have enough data to estimate $\operatorname{Cov}[X_{t+i},X_{t+i+1}]$ (neighboring samples in the sequence), is there any way to test if the covariance matrix is significantly a non-zero matrix? Broadly speaking, I am interested in an indicator which guides me to some meaningful cross-covariance matrices that should be included in my final variance estimation.

Well, if you had infinitely many observations, they will all be significant eventually. Clearly, you cannot do this in practice, but there are ways of 'chopping off' the expansion after some terms, see the accepted excellent answer here. Basically, you define a kernel $k(\cdot)$ which decays to $0$ and assigns weights to the first $l_T$ covariance matrices that you could compute. If you want to choose $l_T$ in a principled way, you will have to dig a bit into the literature, but the post I linked gives you some good references to do exactly that.

Jeremias K
  • 1,471
  • 8
  • 19