3

I want to know how to show that

$$p(a) = \int \delta(a - w^T x) q(w) dw$$ is gaussian, where $q$ is gaussian and $x$ is fixed, and $\delta$ is the dirac function. Everything below is just some motivating background and thoughts on the problem.


In bayesian logistic regression, we end up with a posterior distribution on the weights $w$ of the model. Using the laplacian approximation, we approximate $p(w|D)$ as a normal distribution (where $D$ is our dataset).

Now we want to find the predicative distribution, which is $$p(y|D) = \int p(y|w,x) p(w|D) dw $$ where $y$ is the label of the new datapoint $x$ which we want to classify. $$p(y|w,x) = \sigma(w^Tx)$$ Also, using the notation from Bishop's PRML, we write $q(w)$ in place of $p(w|D)$

So now we want to solve $$\int \sigma(w^Tx)q(w) $$

Bishop evaluates this integral in the following way: $$\int \sigma(w^Tx)q(w) dw = \int \int \sigma(a) \delta(a - w^T x)\ da\ q(w) \ dw$$ (where $\delta(z)$ integrates to 1 and is 0 wherever $z \neq 0$).

Reordering the integrals, we have $$\int \sigma(a) p(a) dw$$ where $p(a)$ is the distribution $$p(a) = \int \delta(a - w^T x) q(w) dw$$

Here is where my question is. According to Bishop, $p(a)$ can be seen as the marginal distribution of a joint gaussian distribution. We know this is gaussian too, so we can just solve for the mean and variance.

However, I don't understand how this can be shown. I can show that integrating out one dimension of a multivariate distribution results in a gaussian marginal on the remaining dimensions. However, that doesn't seem to transfer over here, where we are integrating over all the directions orthogonal to $x$.

shimao
  • 22,706
  • 2
  • 42
  • 81

1 Answers1

1

After thinking about it a while the answer came to me.

Consider trying to marginialize out the second dimensional of a two-dimensional gaussian.

$$f(a) = \int p(a,x_2)\ dx_2 $$

This can be rewritten as

$$f(a) = \int p(x_1,x_2) \delta(x_1-a)\ d \vec x $$

Here, we are integrating over all values of $x_1$ and $x_2$, but the values for which $x_1 \neq a$ are ignored due to the dirac function.

This is very similar to the expression we want to show is gaussian, which is

$$f(a) = \int p_{\vec x ~\sim \mathcal{N}(\mu, \Sigma)}(\vec x) \delta(x^T w - a)\ d \vec x $$

For convenience, assume $\mu = 0$ here. This expression is much easier to handle if we just rotate both the gaussian distribution and the vector $w$. Define $R$ to be the rotation matrix such that $Rw$ is nonzero along the first component. We must also rotate the covariance matrix.

Then we have $$f(a) = \int p_{\vec x ~\sim \mathcal{N}(\mu, R\Sigma R^T)}(\vec x) \delta(x^T Rw - a)\ d \vec x $$

But since $Rw$ is nonzero on the first component, this is $$f(a) = \int p_{\vec x ~\sim \mathcal{N}(\mu, R\Sigma R^T)}(\vec x) \delta(x_1-a)\ d \vec x $$

Which we have seen is simply the usual expression for marginalization along one axis.

$$f(a) = \int p_{\vec x ~\sim \mathcal{N}(\mu, R\Sigma R^T)}(a,x_2) d x_2 $$

So it is simply the marginal distribution of a rotated gaussian. In more than two dimensions, this generalizes to integrating out all but the first dimension of the rotated gaussian, which leaves us with a univariate gaussian, which is what I wanted to show.

shimao
  • 22,706
  • 2
  • 42
  • 81