Applying Bayes' rule in a more technical way when densities don't exist

Question

Say $y \mid x \sim \text{Normal}(Ax, B)$ and $x \sim \text{Normal}(c,D)$. Let's assume further that $y \in \mathbb{R}^1$ and $x \in \mathbb{R}^2$.

To find $p(x \mid y)$ we can usually do

\begin{align*} p(x \mid y) &\propto p(y \mid x) p(x) \\ &\propto \exp\left[ -\frac{1}{2}\left\{ (y - Ax)^\intercal B^{-1}(y-Ax) + (x-c)^\intercal D^{-1}(x-c) \right\} \right] \\ &\propto \exp\left[ -\frac{1}{2}\left\{ x^\intercal A^\intercal B^{-1} A x - 2 x^\intercal A^\intercal B^{-1} y + x^\intercal D^{-1}x -2 x^\intercal D^{-1} c \right\} \right] \end{align*} and end up with a posterior precision of $\left[A^\intercal B^{-1} A + D^{-1} \right] $, and a posterior mean of $\left[A^\intercal B^{-1} A + D^{-1} \right]^{-1}\left[y^\intercal B^{-1}A + c^\intercal D^{-1} \right]$.

However, when $B$ is the $0$ matrix, $y$ follows a "singular" or "degenerate" multivariate normal distribution, and none of the above work is valid because $p(y \mid x)$ doesn't exist (it isn't dominated by Lebesgue's measure).

We can apply the more general Bayes' rule

$$ p(x \in S \mid y) = \frac{\int_S f(x)\mathbb{1}(Ax=y)dx}{\int_{\mathbb{R}} f(x)\mathbb{1}(Ax=y)dx} $$

but I'm having some trouble with the integral. Can anyone help?

Edit:

The last expression isn't valid--taking the expectation of that indicator is $0$. It's kind of like asking for the probability that a continuous random variable is exactly equal to a specific value.

$X$ given $Y=y$ _has_ a density, but with respect to the Lebesgue measure on the subspace $Ax=y$. — Xi'an, May 01 '20 at 08:22
@Xi’an With respect to Lebesgue measure? Wouldn’t that make it continuous and have a density in the usual way? — Dave, May 02 '20 at 01:56
@Taylor So you mean something different than Radon-Nikodym derivative with respect to Lebesgue measure resulting in a density in the usual sense, right? — Dave, May 02 '20 at 03:36
@Dave: "Lebesgue measure on the _subspace_" not on the whole space. — Xi'an, May 02 '20 at 18:25

Taylor · Accepted Answer · 2020-05-06T00:33:00.640

Define

\begin{align*} S &= \{ x : Ax = y, x_2 \le s\} \\ &= \{ (x_1, x_2) : a_1 x_1 + a_2 x_2 = y, x_2 \le s \} . \end{align*}

We can say that \begin{align*} p(x \in S \mid y) &= \frac{\int_{-\infty}^s \int _{\mathbb{R}} f_{x_1,x_2}(x_1,x_2)\delta_{a_1 x_1 + a_2 x_2}(dy) dx_2 }{\int_{\mathbb{R}} \int _{\mathbb{R}} f_{x_1,x_2}(x_1,x_2)\delta_{a_1 x_1 + a_2 x_2}(dy) dx_2}\\ &=\frac{P(a_1 x_1 + a_2 x_2 = y, x_2 \le s)}{P(a_1 x_1 + a_2 x_2 = y)} \\ \end{align*}

taking care not to write $dx$ or $dx_1dx_2$ anywhere (because that would imply we have a Radon-Nikodym derivative with respect to the product Lebesgue measure).

So we get $$ p(x \in S \mid y) = \frac{\int_{-\infty}^s f_{x_1,x_2}[(y - a_2x_2)/a_1, x_2] dx_2}{\int_{-\infty}^{\infty} f_{x_1,x_2}[(y - a_2x_2)/a_1, x_2] dx_2} \tag{3} $$

Ben · Answer 2 · 2020-05-05T22:22:37.540

1

The degenerate normal distribution with zero variance matrix is just a point-mass distribution on its mean (if you take it to be well-defined at all), so you have $\mathbb{P}(Y=Ax|X=x) = 1$. To facilitate analysis, define the set function:

$$\mathcal{H}(y) \equiv \{ x \in \mathbb{R} | y=Ax \} \quad \quad \quad \text{for all } y \in \mathbb{R},$$

so we have $f(y|x) = \mathbb{I}(x \in \mathcal{H}(y))$. The relevant application of Bayes' theorem is:$^\dagger$

$$\begin{aligned} p(X \in \mathcal{S}|Y=y) &= \frac{f(x \in \mathcal{S}, y)}{f(y)} \\[6pt] &= \frac{\int_\mathcal{S} f(x,y) \ dx}{\int_\mathbb{R} f(x,y) \ dx} \\[6pt] &= \frac{\int_\mathcal{S} f(y|x) f(x) \ dx}{\int_\mathbb{R} f(y|x) f(x) \ dx} \\[6pt] &= \frac{\int_\mathcal{S} \mathbb{I}(x \in \mathcal{H}(y)) f(x) \ dx}{\int_\mathbb{R} \mathbb{I}(x \in \mathcal{H}(y)) f(x) \ dx} \\[6pt] &= \frac{\int_{\mathcal{S} \ \cap \ \mathcal{H}(y)} f(x) \ dx}{\int_{\mathcal{H}(y)} f(x) \ dx}. \\[6pt] \end{aligned}$$

$^\dagger$ For simplicity, we will ignore the pathological case where $A=\mathbf{0}$ and we condition on $y \neq 0$. In that pathological case we have $\mathcal{H}(y) = \varnothing$ and so we cannot deploy the equation shown. To deal with that pathological case, see here.

edited May 05 '20 at 22:22

answered May 01 '20 at 07:21

Ben

91,027
3
150
376

Instead of $p(x|y) = \mathbb{I}(x \in \mathcal{H}(y))$ do you mean $f(y|x) = \mathbb{I}(x \in \mathcal{H}(y))$? – Taylor May 01 '20 at 18:40
Thanks --- corrected. – Ben May 02 '20 at 01:15
Also, isn't $\int_{\mathcal{H}(y)} f(x) dx$ always zero? – Taylor May 05 '20 at 21:37
Since $y \in \mathbb{R}$ and $x \in \mathbb{R}^2$ you have $y=Ax=a_1 x_1 + a_2 x_2$, which means that the set $\mathcal{H}(y)$ is going to be a line in $\mathbb{R}^2$. (For simplicity, I will ignore the pathological case where $A=\mathbf{0}$ and you condition on $y \neq 0$.) So, aside from the pathological case, you are integrating the bivariate normal density over a line, which will give you a non-zero integral. – Ben May 05 '20 at 22:04
what I'm getting at is that I think you need to replace an indicator with a Dirac delta. Integrating over the line is like integrating $\int_a^a f(x)dx$ which is always $0$. This is the main difference right now between my answer and your answer – Taylor May 12 '20 at 20:10

Applying Bayes' rule in a more technical way when densities don't exist

Edit:

2 Answers2