3

I want to derive a Bayesian learning procedure where I don't only learn from my own signal, but also from other signals which are correlated to mine.

I thought it could simply work with Bayesian inference about the mean of a multivariate normal distribution (with known covariance matrix $\Sigma$), but I'm quite puzzled by the result. As you may know, if one start from a prior $\mu_0, \Sigma_0$, the posterior will be given by (cf this thread):

$\mu_n = \Sigma_0 \Big( \Sigma_0 + 1/n \Sigma \Big)^{-1} \Big( 1/n \sum^n_{i=1} \textbf{x}_i \Big) + 1/n \Sigma \Big( \Sigma_0 + 1/n \Sigma \Big)^{-1} \mu_0$
$\Sigma_n = \Sigma_0 \Big( \Sigma_0 + 1/n \Sigma \Big)^{-1} 1/n \Sigma$

Now, what puzzles me is that, if I start from a prior $\Sigma_0 = \Sigma$, I end up in a situation where I only learn about the mean of the first variable using observations of the first variable!
Indeed: in this case, $\Sigma_0 + 1/n \Sigma = (n+1)/n \Sigma$, hence $\Sigma_0 \Big( \Sigma_0 + 1/n \Sigma \Big)^{-1} = n/(n+1) Id$. I don't get why one does not use the additional information contained in correlated signals.

It's pretty much the same if one start from uninformed prior $\Sigma_0 (\rightarrow \infty)$ since after one iteration, $\Sigma_n \rightarrow \Sigma$.

So I don't really get the intuition behind it. Imagine $x$ is just bivariate $= (x_1, x_2)'$. Since I know that $x_2$ is correlated to $x_1$, why can't I use the observation of $x_2$ to learn about the mean of $x_1$? I'll just learn "less perfectly" than by observing my own signal.
By the way in the extreme case where I assume that $\Sigma_0$ is a matrix full of $1$ (or any constant) (i.e. correlation $=1$), then I end up learning as well from each variables (as if I only had one variable). But this get destroyed as soon as the covariance $\neq$ variance in the prior.

There's probably something I'm missing here, but I don't know what. I think maybe because the covariance is `net of the mean', and hence it does not help or something like that.

By the way, since Bayesian inference about a multivariate normal does not seem to be the correct way to go, do you know of another example where one learns about the mean of a variable $x_1$ by observing another variable $x_2$ that is only correlated to the first one?

G. Ander
  • 137
  • 5
  • Is $\Sigma$ your observed correlation matrix? If so, how can the prior equal it? – jbowman Aug 08 '19 at 16:33
  • $\Sigma$ is the known covariance matrix between the variables. You just know it (standard case where you learn about the mean with known variance). Your prior is not necessarily equal to it but I generally starts with uninformed prior, in which case I end up with a first posterior equal to $\Sigma$ directly. – G. Ander Aug 08 '19 at 16:35

0 Answers0