6

Given $ M \in \mathbb{R}^{N \times N} $ which is a Positive Definite Matrix.
Let $ \hat{x} $ the MMSE of $ x $ given $ z $, namely $ \hat{x} = \mathbb{E} \left[ x \mid z \right] $.

Prove the equivenalce of the following expressions:

  1. $ \arg \min_{\hat{x}} \mathbb{E} \left[ {\left( \hat{x} - x \right)}^{T} \left( \hat{x} - x \right) \mid z \right] $.

  2. $ \arg \min_{\hat{x}} \mathbb{E} \left[ {\left( \hat{x} - x \right)}^{T} M \left( \hat{x} - x \right) \mid z \right] $.

  3. $ \arg \min_{\hat{x}} \operatorname{Tr} \left( M \mathbb{E} \left[ {\left( \hat{x} - x \right)}^{T} \left( \hat{x} - x \right) \mid z \right] \right) $

Equivalence means all are actually minimized by $ \hat{x} = \mathbb{E} \left[ x \mid z \right] $.

The equivalence of 2 and 3 is easy using the Cyclic Property of the $ \operatorname{Tr} \left( \cdot \right) $ operator.
Yet showing the invertible transformation keeps the solution (Equivalence of 1 and 2) isn't trivial.

Royi
  • 33,983
  • 4
  • 72
  • 179
  • 1
    This question has been [asked simultaneously on stats.SE](http://stats.stackexchange.com/q/91228/6633) where it seems a more natural fit. I recommend closing it here and migrating it to stats.SE – Dilip Sarwate Mar 25 '14 at 03:29
  • @DilipSarwate, As you can see, It was answered here and not there. Since both in beta and have low exposure with almost no overlap in people I see no harm in it. Where I'll get the answer I will post it on the other one so they both will have this knowledge. – Royi Mar 25 '14 at 04:54

2 Answers2

4

Since $ M \in \mathbb{S}^{N}_{++} $ (In other convention $ M \succ 0 $) by Cholesky Decomposition there is a Triangular Matrix $ R \in \mathbb{R}^{N \times N} $ such that $ M = {R}^{T} R $.

Using this fact one could prove $ 1 \iff 2 $ as following:

$$\begin{align*} \arg \min_{\hat{x}} \mathbb{E} \left[ {\left( \hat{x} - x \right)}^{T} M \left( \hat{x} - x \right) \mid z \right] & = \arg \min_{\hat{x}} \mathbb{E} \left[ {\left( \hat{x} - x \right)}^{T} {R}^{T} R \left( \hat{x} - x \right) \mid z \right] && \text{$M = {R}^{T} R $} \\ & = \arg \min_{\hat{x}} \mathbb{E} \left[ {\left( R \left( \hat{x} - x \right) \right)}^{T} \left( R \left( \hat{x} - x \right) \right) \mid z \right] && \text{} \\ & = \arg \min_{\hat{x}} \mathbb{E} \left[ {\left( R \hat{x} - y \right)}^{T} \left( R \hat{x} - y \right) \mid z \right] && \text{Defining $ y = R x $} \end{align*}$$

Clearly, the above, as the classic MMSE problem, with respect to $ y $ given $ z $ which is minimized when:

$$ R \hat{x} = \mathbb{E} \left[ y \mid z \right] \Rightarrow \hat{x} = {R}^{-1} \mathbb{E} \left[ y \mid z \right] $$

Since $ M \succ 0 $ then $ {R}^{-1} $ is defined and the above is valid.
Moreover by linearity of the Expectation Operator:

$$ \hat{x} = {R}^{-1} \mathbb{E} \left[ y \mid z \right] = {R}^{-1} \mathbb{E} \left[ R x \mid z \right] = {R}^{-1} R \mathbb{E} \left[ x \mid z \right] = \mathbb{E} \left[ x \mid z \right] $$

Hence the minimizer of 1 indeed minimizes 2.

Using the cyclic property of the Trace Operator and the linearity of the Expectation Operator (Hence one could change the order of the $ \operatorname{Tr} \left( \cdot \right) $ and $ \mathbb{E} \left[ \cdot \right] $) one could easily show $ 2 \iff 3 $ hence the problem is solved.

Royi
  • 33,983
  • 4
  • 72
  • 179
2

Use the Cholesky decomposition of $M$ as a change of coordinates.

$tr(ABC) = tr(CAB)$ - trace is invariant under cyclic permutations, and $tr(scalar)=scalar$. Thus, $tr(M E[ x x^T]) = tr(E[M x x^T]) = E[tr(M x x^T) ] = E[tr( x^T M x ) ] = E[x^T M x]$. Now, replace $x$ with $\hat{x}-x$.

Royi
  • 33,983
  • 4
  • 72
  • 179
Batman
  • 1,071
  • 6
  • 12
  • The property still holds in this case. As for the first comment, use the first line of the answer. – Batman Mar 25 '14 at 13:48
  • 1
    PD by definition requires hermitian/symmetric, so cholesky exists. Try it on your own, noting that $\hat{x} = E[X|Z] = E[X|\text{invertible function of }z]$ and using the cholesky decomposition to find the invertible function. – Batman Mar 25 '14 at 17:17
  • I don't think the function should be applied on $ z $. Unless the answe I added is wrong. Though the property of the Conditional Expectation you mentioned is intuitive, I would be happy to see its derivation. Is there a place to see a full derivation of it? – Royi Jul 01 '18 at 15:46
  • @Royi: I loved your proof. In Batman's statement regarding $invertible function$, I think that he is using the fact that $E[X | z, g(z)] = E[X | g(z)]$. This can be proven using the tower property of conditional expectation. ( condition on g(z) again and then the inside z drops out ). But I'm still not clear how one can use that result to prove the original statement. If you do, the knowledge is appreciated. – mark leeds Jul 02 '18 at 00:17
  • @markleeds, I also don't see how to apply a function on $ z $ in order to show the equivalence of 1 and 2. Hopefully Batman will explain. – Royi Jul 02 '18 at 04:14
  • @Royi: maybe he meant what you did and just explained it differently ? Hopefully he or someone else can explain. Still, I found your detailed proof quite educational. Thanks. – mark leeds Jul 02 '18 at 06:43
  • @Royi: I think I get it now. He doesn't mean NOW, replace it. He means, take the proof that he wrote in the comment but, instead of using $x$ when starting out with the first line. use $(\hat{x} - x)$ everywhere in all the steps. In this way, nothing will change ( all the steps remain legal ) and the end will be what is needed. I hope this is clear. The NOW at the last line confused me so maybe you also. – mark leeds Jul 02 '18 at 06:49
  • @markleeds, I was talking about > Try it on your own, noting that x^=E[X|Z]=E[X|invertible function of z] and using the cholesky decomposition to find the invertible function. Batman says something about using invertible function on $ z $ to show $ 1 \iff 2 $. I guess he meant the Cholesky Decomposition of $ M $ but I don't get what transform to apply on $ z $. – Royi Jul 02 '18 at 07:00
  • @Royi: Sorry. I was confused by both Batman's answer and his comment regarding invertible function. Now I'm only confused by his comment regarding invertible function. So, I can't help you there because, even with the $E(X| z, g(z) ] = E[X | g(z)]$ I also still don't see how to get rid of M ? – mark leeds Jul 02 '18 at 07:19