1

I'm reading Andrew Ng's notes on machine learning, and on page 12 of this document, he makes a step in his proof that I'm trying to decipher:

Let $\textbf{x} = \left( 1 , x_1 , x_2 , \cdots , x_n \right)^T$, a vector of variables, and $\theta = \left( \theta_0 , \theta_1 , \theta_2 , \cdots , \theta_n \right)^T$, a vector of linear coefficients of those variables. Let's define $y$ as

$$y = \theta ^T \textbf{x} + \epsilon$$ where $\epsilon \sim \mathcal{N}(0,\sigma^2)$, that is

$$p(\epsilon) = \frac{1}{\sqrt{2 \pi}}\exp \left( -\frac{\epsilon^2}{2\sigma^2}\right).$$

Next line says the following about conditional probability of $y$ given $\textbf{x}$ and coefficients $\theta$, which are treated as deterministic:

$$p(y|x;\theta) = \frac{1}{\sqrt{2 \pi}}\exp\left( -\frac{(y- \theta ^T \textbf{x})^2}{ 2\sigma^2}\right)$$

Can someone help me see how we get this conditional distribution?

Phonon
  • 113
  • 4

1 Answers1

8

In this context, $x$ and $\theta$ can be thought of as constants, as can their product $\theta^Tx$. It might be easier to follow if you replace this product with a single variable $C$. Then $y=\varepsilon+C \sim \mathcal{N}(C,\sigma^2)$, which has the density you give in the last line (this is just because adding a constant to a Gaussian random variable gives the same distribution with its mean increased by that constant). I think you are likely just overthinking things because you're not used to the vector notation used here.

bnaul
  • 3,011
  • 1
  • 17
  • 16