2

I am going to through the theory behind factor analysis models given here

Let's say our model is \begin{align} y_i = \mathcal \Lambda x_i +\epsilon, \end{align}

where $y_i$ is the $p$-dimensional observation and $x_i \in \mathcal N (0,I_q) $ is the q-dimensional underlying latent variable. $\Lambda $ is the loading matrix. $\epsilon \in \mathcal N(0,\Psi)$ is the error term.

Now I want to get the conditional expectation of latent variable $x$ given {$y,\Lambda,\Psi$}, i.e. \begin{align} p(x|y,\Lambda,\Psi) \end{align}

Here is my attempt: \begin{align} \Lambda x & = y - \epsilon \\ \implies \Lambda^{'}\Lambda x & = \Lambda^{'}(y - \epsilon) \\ \implies x & = (\Lambda^{'}\Lambda)^{-1}(\Lambda^{'}(y - \epsilon)) \\\end{align}

If I take the conditional expected value of $x$ now, I get \begin{align} E(x|y,\Lambda,\Psi)&= E((\Lambda^{'}\Lambda)^{-1}(\Lambda^{'}(y - \epsilon))) \\ E(x|y,\Lambda,\Psi)&= ((\Lambda^{'}\Lambda)^{-1}(\Lambda^{'}(y - E(\epsilon)))) \\ E(x|y,\Lambda,\Psi)&= (\Lambda^{'}\Lambda)^{-1}(\Lambda^{'}y) \\\end{align}

But, the expression given on Slide 18 in the above slides, the conditional expected value of $x$ reads something likes $Λ (ΛΛ^T+ Ψ)^ {−1}y$

Can you please point out the mistake I am doing in my derivation. Thanks in advance!

Noah
  • 20,638
  • 2
  • 20
  • 58
honeybadger
  • 1,262
  • 8
  • 23
  • I just updated to explain what went wrong with your proof. Basically, your method is also a perfectly fine way to do it, but you made the mistake of saying $E(\varepsilon | y_i ) = 0$ when that's not actually the case – jld Jul 13 '18 at 18:33
  • That’s brilliant! I think this answers my doubt completely!! – honeybadger Jul 14 '18 at 01:30
  • this was a sneaky error, I also set $E(\varepsilon | y) = 0$ the first time i went through it without even thinking about it :) – jld Jul 14 '18 at 16:41

1 Answers1

3

$\newcommand{\e}{\varepsilon}$$\newcommand{\L}{\Lambda}$We have $y_i = \mu + \L x_i + \e_i$.

We want to think of the jointly Gaussian RV $$ {x_i \choose y_i} \sim \mathcal N\left({0 \choose \mu}, \begin{bmatrix} I & \L^T \\ \L & \L\L^T + \Psi\end{bmatrix} \right) $$ (slide 17 in that pdf) and from this we get $$ x_i | y_i \sim \mathcal N\left(\L^T(\L\L^T+\Psi)^{-1}(y_i - \mu), I - \L^T(\L\L^T+\Psi)^{-1}\L\right) $$ (slide 18) so $$ E(x_i \vert y_i, \L, \Psi) = \L^T(\L\L^T+\Psi)^{-1}(y_i - \mu). $$


Here's where you went wrong: $$ E(x_i | y_i) = (\L^T\L)^{-1}\L^T(y_i - \mu - E(\e | y_i)) $$ but the mistake is that while $E(\e) = 0$, $E(\e | y_i)$ is not necessarily $0$.

Again think of the joint distribution of ${\e \choose y_i}$. Note $Cov(\e, y_i) = Cov(\e, \L x_i + \e) = \Psi$ so $$ {\e \choose y_i} \sim \mathcal N\left({0 \choose \mu}, \begin{bmatrix} \Psi & \Psi \\ \Psi & \L\L^T + \Psi\end{bmatrix}\right). $$ This means $$ E(\e|y_i) = \Psi(\L\L^T + \Psi)^{-1}(y - \mu) $$ so taking $\mu = 0$ now for simplicity we have $$ E(x_i | y_i) = (\L^T\L)^{-1}\L^T(y_i - \Psi(\L\L^T + \Psi)^{-1}y_i) \\ = (\L^T\L)^{-1}\L^T(I - \Psi(\L\L^T + \Psi)^{-1}) y_i. $$

Now consider $$ \L^T(\L\L^T+\Psi)^{-1} - (\L^T\L)^{-1}\L^T(I - \Psi(\L\L^T + \Psi)^{-1}) \\ = \L^T(\L\L^T+\Psi)^{-1} - (\L^T\L)^{-1}\L^T + (\L^T\L)^{-1}\L^T \Psi(\L\L^T + \Psi)^{-1} \\ = \left(\L^T - (\L^T\L)^{-1}\L^T(\L\L^T + \Psi) + (\L^T\L)^{-1}\L^T \Psi\right)(\L\L^T + \Psi)^{-1} \\ = \left(\L^T - (\L^T\L)^{-1}\L^T\L\L^T - (\L^T\L)^{-1}\L^T \Psi + (\L^T\L)^{-1}\L^T \Psi\right)(\L\L^T + \Psi)^{-1} \\ = \left(\L^T - \L^T - (\L^T\L)^{-1}\L^T \Psi + (\L^T\L)^{-1}\L^T \Psi\right)(\L\L^T + \Psi)^{-1} \\ = \mathbf 0. $$

This means $$ \L^T(\L\L^T+\Psi)^{-1} = (\L^T\L)^{-1}\L^T(I - \Psi(\L\L^T + \Psi)^{-1}) $$

so actually this method, when correcting the mistake about $E(\e | y_i)$, yields the exact same answer! It just takes some work to turn one into the other.

jld
  • 18,405
  • 2
  • 52
  • 65
  • I understand that my approach is flawed. Can you please also explain how the conditional distribution follows from the joint distribution in slide 17? – honeybadger Jul 13 '18 at 15:28
  • @kasa that part has been discussed on this site here: https://stats.stackexchange.com/questions/30588/deriving-the-conditional-distributions-of-a-multivariate-normal-distribution plus the wikipedia article for the multivariate normal discusses it a bit: https://en.wikipedia.org/wiki/Multivariate_normal_distribution#Conditional_distributions – jld Jul 13 '18 at 15:34