Marginalization over the nuisance variable

Question

I was reading a paper in which they state $$ \text{P}(\mathbf{y}, \mathbf{f}, \mathbf{u}) = \text{P}(\mathbf{y}| \mathbf{f})\text{P}(\mathbf{f}| \mathbf{u})\text{P}(\mathbf{u})$$ With $\mathbf{f}$ being conditionally independent to $\mathbf{y}$ given $\mathbf{u}$. They state the following equation to marginalize $\mathbf{u}$ to obtain the posterior for $\mathbf{f}$. $$ \text{P}(\mathbf{f}| \mathbf{y}) = \int \text{P}(\mathbf{f}| \mathbf{u})\text{P}(\mathbf{u}| \mathbf{y})d\mathbf{u}$$

How did they get from the joint to the marginal over $\mathbf{u}$?
In the general case where there are no independence or conditional independence, since $\text{P}(\mathbf{y}, \mathbf{f}, \mathbf{u})$ has 3 different expressions based on the chain rule, how would i go about in choosing the 1 of the 3 expressions in a way that i wanted to marginalize out $\mathbf{u}$

EDIT: Thanks all , I think its more like I havent provided enough context rather than a typo? (I took the equations as is from the following referenced papers) so basically the paper is from https://arxiv.org/pdf/1705.08933.pdf (Eq 1) which has to do with Deep Gaussian Processes (although in this case its explaining a single layer GP which is analogous to a sparse GP i.e. a standard GP with an additional (output) variable $\mathbf{u}$). What they wrote is : $$ \text{p}(\mathbf{y}, \mathbf{f}, \mathbf{u}) =\text{p}(\mathbf{f}| \mathbf{u})\text{p}(\mathbf{u}) \prod_{i=1}^{N}\, \text{P}(y_i| f_i)$$ Where the product of the first two terms of the RHS is the GP prior and the last term the likelihood In another paper (https://arxiv.org/pdf/1806.05490.pdf), which is a continuation of the former, describes the same formula as $$ \text{p}(\mathbf{y}, \mathbf{f}, \mathbf{u}) = \text{p}(\mathbf{y}| \mathbf{f})\text{p}(\mathbf{f}| \mathbf{u})\text{p}(\mathbf{u})$$ Exactly as i have written it previously.

Do you have a typo in the very first equation, on the RHS? Should the first term be p(y|u)? — Shang Zhang, Jul 26 '21 at 04:01
(1) & (2) This is an identity called Chapman-Kolmogorov [wikipedia](https://en.wikipedia.org/wiki/Chapman–Kolmogorov_equation), [mathworld](https://mathworld.wolfram.com/Chapman-KolmogorovEquation.html) — msuzen, Jul 26 '21 at 21:36

score 4 · Answer 1 · answered Jul 26 '21 at 11:53

4

As mentioned in the comments, the first multiplicand should be $p(y|u)$ because it's originally $p(y|f,u)$ and it's stated that $y$ and $f$ are conditionally independent given $u$.

For the integral, you have $$p(f|y)=\int p(f,u|y)du=\int \underbrace{p(f|y,u)}_{p(f|u)}p(u|y)du$$
Based on what you calculate, you'll include the term $u$ in the three-term expression and integrate over it, just as in the middle part above.

answered Jul 26 '21 at 11:53

gunes

49,700
3
39
75

Nice explanation, but original post states `In the general case where there are no independence or conditional independence`. – msuzen Jul 26 '21 at 21:41
1

@MehmetSuzen thanks for your comment! That assumption is for the second part and I believe I addressed it in (2). But, just to reiterate for the OP, in general case, the equation given in the beginning doesn't hold. – gunes Jul 26 '21 at 21:50
Thank you for the response, Got it it was for the (2) part. – msuzen Jul 26 '21 at 21:58
Thanks @gunes , I made an edit in the original post. Basically, i took the equations exactly as is in the paper. So im not sure about a typo :/, maybe more like i havent provided enough context to the problem? – lefe Jul 29 '21 at 01:34

score 3 · Accepted Answer · answered Jul 26 '21 at 11:50

If you have the joint density $P(y,f,u)$ and you know that $f\perp y | u$, then you can rewrite the joint as

$$P(y,f,u)=P(y,f|u)P(u)=P(y|u)P(f|u)P(u) \ \ (1)$$

Then for calculating to the marginal $P(f,y)$ over $u$ you calculate

$$P(f|y)=\int P(f,u|y)du = \int \frac{P(f,u,y)}{P(y)}du$$

now using the decomposition of the joint distribution $(1)$ you have that

$$= \int \frac{P(y|u)P(f|u)P(u)}{P(y)}du$$

I don't really know which other expressions you refer to, but in the cases where you do not have any kind of independence the you would calculate the $P(f|y)$ as

$$P(f|y) = \int P(f,u|y)du=\int \frac{P(f,u,y)}{P(y)}du = \int \frac{P(f|u,y)P(u,y)}{P(y)}du=\int\frac{P(f|u,y)P(u|y)P(y)}{P(y)}du$$

$$=\int P(f|u,y)P(u|y)du$$

(+1) for detailed explanations. – gunes Jul 26 '21 at 11:55 — gunes, Jul 26 '21 at 11:55

msuzen · Answer 3 · 2021-07-28T00:06:31.837

Though, there were great answers, specially from @gunes.

The most generic case where there is no independence or conditional independence assumption, marginalising $\mathbf{f}$, over $\mathbf{u}$ given $\mathbf{y}$ forms a Markovian chain, $\mathbf{f} \rightarrow \mathbf{u} \rightarrow \mathbf{y}$, and expressed as Chapman-Kolmogorov Equation (CKE), or identity. This is exactly what authors written,

$$ \text{P}(\mathbf{f}| \mathbf{y}) = \int \text{P}(\mathbf{f}| \mathbf{u})\text{P}(\mathbf{u}| \mathbf{y})d\mathbf{u} $$

This relationship is a CKE. Each conditional probability is a transition density in Markovian sense for the transitions. The variable $\mathbf{u}$ plays a role of nuisance variable that we marginalise over, because we are not interested in but it appears as an important intermediate state.

We can write similar relationships in any orderings, such as $\mathbf{y} \rightarrow \mathbf{u} \rightarrow \mathbf{f}$ or $\mathbf{u} \rightarrow \mathbf{f} \rightarrow \mathbf{y}$, etc.

(+1) Thanks @Mehmet Suzen for the insight – lefe Jul 29 '21 at 01:43 — lefe, Jul 29 '21 at 01:43

Marginalization over the nuisance variable

3 Answers3