Bayesian updating with normal but incomplete signals

Question

Suppose I want to update my beliefs on a realization of vector $\theta = (\theta_{1},...,\theta_{d}) \in \mathbb{R}^{d}$ and for every one of these $\theta_{j}$, in each period I may receive signal $y_{jt} = \theta_{j} + \epsilon_{jt}$.

$\theta$ is normally distributed with mean $\mu_{\theta} \in \mathbb{R}^{d}$ and covariance matrix $\Sigma_{\theta} \in \mathbb{R}^{d\times d}$, the later allowing for positive correlation between the $\theta_{j}$'s. Further, $\epsilon_{jt}$ is an iid normal shock with mean 0 and variance $\sigma_{\epsilon}^{2}$. Because of these assumptions, $y_{t} \sim \mathbb{N}(\mu_{\theta},\Sigma_{\theta} + \sigma_{\epsilon}^{2} \mathbb{I}_{d})$.

If we observed all signals $y_{t} = (y_{1t},...,y_{dt}) \in \mathbb{R}^{d}$, then characterizing the mean and covariance of the posterior normal distribution is straightforward, assuming that prior $\pi(\theta)$ is exactly the distribution of $\theta$.

My question is the following: assume that for some reason we do not observe all signals, only a subset of $y_{t}$, is there a way to have an analytic expression for the posterior mean and covariance matrix as a function of the signals we do observe?

For example, assume that $d=3$ and you only observe signals $y_{1t}$ and $y_{3t}$. To derive $\pi(\theta|y_{1t},y_{3t})$, you would need to compute the likelihood $L(y_{1t},y_{3t} | \theta)$, which is going to be the marginal joint distribution of $y_{1t}$ and $y_{3t}$. However, I am having trouble deriving this likelihood for an arbitrary combination of signals we do observe.

If you assume that indicator vector $d_{t} \in \mathbb{R}^{d}$ is such that if $d_{jt} = 1$ then we observe signal $y_{jt}$, is there a way to characterize the posterior mean and variance as a function of this vector $d_{t}$?

Even if you do not know the answer, I would appreciate if you could point me to literature that might be useful. Thanks!

Eric Perkerson · Accepted Answer · 2020-04-02T17:43:34.080

Let's drop the $t$ subscripts for simplicity of notation. I'm also going to use $\omega \in \{0, 1\}^d$ to denote the indicator vector of the observed vector (to avoid confusing it with the dimension of the space which we're calling $d$). So, in your example with $y_1$ and $y_3$ being observed and $y_2$ not being observed, we would have $\omega = (1, 0, 1)$. We're supposing that the indicator $\omega$ is given, so I'm not going to write it in every conditional.

Note that the observed vector $z$ is given by the entry-wise product $z = \omega \odot y$ where we are filling in all the non-observed $y_j$'s with the value $0$ so that we still have a vector in $\mathbb{R}^d$. These non-observed values now have deterministic values, which will preclude $z|\theta$ from having a density function with respect to the Lebesgue measure on $\mathbb{R}^d$. But with the right measure (the one ignoring the known, $0$-valued coordinates), this is not an issue: given $\theta$, we have $z \sim \mathcal{N}^* (\theta, \Omega)(z)$, a degenerate multivariate normal distribution with mean $\theta$ and ``covariance'' $\Omega = \text{Diag}(\omega /\sigma_\epsilon^2 )$ (quotes because $\Omega$ is not a full-rank matrix, and hence is not positive-definite like covariance matrices are required to be). Once you recognize this fact, the computation looks like the Bayesian update formula for a normal likelihood and a normal prior. Using Bayes' Theorem: \begin{align} p(\theta | z) & \propto p(z | \theta) p(\theta) \\ & = \prod_{\{ j \colon \omega_j = 1 \}} \mathcal{N}(\theta_j, \sigma_\epsilon^2)(y_j) \times \mathcal{N}(\mu_\theta, \Sigma_\theta)(\theta) \\ & \propto \exp\left( \frac{- \sum_{\{j \colon \omega_j = 1\}}(y_j - \theta_j)^2}{2\sigma_\epsilon^2} \right) \times \exp\left( \frac{-1}{2} (\theta - \mu_\theta)^T \Sigma^{-1} (\theta - \mu_\theta) \right) \\ & = \exp\left( \frac{-1}{2} \left( (y - \theta)^T \text{Diag}(\omega /\sigma_\epsilon^2 ) (y - \theta) + (\theta - \mu_\theta)^T \Sigma^{-1} (\theta - \mu_\theta) \right) \right) \\ & \propto \exp\left( \frac{-1}{2} \left( \theta^T (\Omega + \Sigma^{-1}) \theta - 2 (y^T \Omega + \mu_\theta^T \Sigma^{-1}) \theta \right) \right) \text{ , where } \Omega = \text{Diag}(\omega /\sigma_\epsilon^2 )\\ & = \exp\left( \frac{-1}{2} \left( \theta^T (\Omega + \Sigma^{-1}) \theta - 2 (\Omega + \Sigma^{-1})^{-1} (\Omega + \Sigma^{-1}) (y^T \Omega + \mu_\theta^T \Sigma^{-1}) \theta \right) \right) \\ & \propto \exp\left( \frac{-1}{2} \left( (\theta - \mu_1)^T (\Omega + \Sigma^{-1}) (\theta - \mu_1) \right) \right) \text{ , where } \mu_1 = (\Omega + \Sigma^{-1})^{-1} (y^T \Omega + \mu_\theta^T \Sigma^{-1})^T\\ & \propto \mathcal{N}(\mu_1, (\Omega + \Sigma^{-1})^{-1})(\theta) \end{align} So our posterior distribution looks essentially identical to the standard Bayesian update formula. You can now repeat for a sequence of observations indexed by $t$ and you can simplify this using the Woodbury matrix formula if you want, similar to the computation in the previous link, but I leave that as an exercise for you.

Thank you very much @ericperkerson ! However, I realized that my question wasn't exactly what I am looking for. The question that I had in mind is exactly the one in [here](https://stats.stackexchange.com/questions/421288/multivariate-bayesian-inference-learnig-about-the-mean-of-a-variable-by-observi/457218#457218), where every entry of the posterior mean is updated in a "hey I didn't observe a signal for 1 but I did for 2 and I know 1 and 2 are correlated to some degree" fashion. Any idea on how to do this? — eljsrr, Apr 01 '20 at 16:39

eljsrr · Answer 2 · 2020-04-01T16:54:17.550

Just a small correction to the previous solution. The correct expression for the posterior mean is given by:

\begin{align*} \mu_{1} & =(\Omega + \Sigma^{-1})^{-1} (y^{T} \Omega + \mu_{\theta}^{T} \Sigma^{-1})^{T} \\ & = (\Omega + \Sigma^{-1})^{-1} (\Omega y + \Sigma^{-1} \mu_{\theta}) \end{align*}

As for the generalization to an arbitrary sequence of incomplete signals as in here, the result is as follows (in case it's useful for someone):

\begin{align*} \mathbb{E}(\theta | \{\omega_{i}\}_{i=i}^{N} ) & = \Big(\Sigma^{-1} + \sum_{i=1}^{N} \Omega_{i} \Big)^{-1} \Big(\Sigma^{-1} \mu_{\theta} + \sum_{i=1}^{N} \Omega_{i} y_{i} \Big) \\ \mathbb{V}(\theta | \{\omega_{i}\}_{i=i}^{N} ) & = \Big(\Sigma^{-1} + \sum_{i=1}^{N} \Omega_{i} \Big)^{-1} \end{align*}

where of course, if $N=1$ then the general expression collapses to the one shown by ericperkerson. I used the Woodbury Matrix Formula for trying to simplify these expressions but it doesn't do much (contrary to the case in which $\Omega_{i} = \Omega \hspace{2mm} \forall i$) for the mean or covariance, so I leave them as above.

Bayesian updating with normal but incomplete signals

2 Answers2