4

$E[X|Y]$ is equal to the almost surely unique and deterministic function of $Y$, say $\varphi (Y)$, such that

$$E[X f(Y)]= E[\varphi (Y)f(Y)]$$

for all bounded, measurable and non-negative $f$'s.

How can I prove this fact?

In particular, how the a.s. uniqueness and the equality of the conditional expectation to $\varphi(Y)$ follows from the definition of the conditional expectation defined as usual by $$E[X|Y]=\sum_x x\cdot P(X=x|Y)?$$

Xi'an
  • 90,397
  • 9
  • 157
  • 575
user852508
  • 53
  • 4
  • 1
    This definition of expectation is not one to use when you are considering a fact that might hold almost surely: you need a measure-theoretic definition. – whuber Dec 17 '20 at 21:06
  • 1
    @whuber what is wrong with the universal property at the top? Isn't that the measure theoretic definition (except for the fact that one can define it with or without test functions...)? – Fabian Werner Dec 17 '20 at 22:03
  • 1
    Caution: The definition is that $E[X|Y]$ is the almost $\sigma(Y)$-surely defined function that does some stuff. Let $A, B$ be two such functions, then integrate the test functions $\phi(\omega) = 1_{A > B}(\omega)$ and $\phi(\omega) = 1_{B > A}(\omega)$ multiplied by $A-B$. Then use that if $\int f = 0$ and $f$ is positive then $f=0$ (almost surely for the correct sigma algebra) – Fabian Werner Dec 17 '20 at 22:03
  • 1
    @Fabian The problem lies in the definition of conditional expectation, which works only for variables of at most countable support. – whuber Dec 17 '20 at 22:04
  • 2
    @whuber ooh you were talking about the explicit way of writing E[X|Y]... now I understand. – Fabian Werner Dec 17 '20 at 22:05
  • 1
    Concerning the second point (the explicit way of writing $E[X|Y]$). First of all, this can only be senseful if the space $X$ maps to is discrete (otherwise we need an integral instead of a sum). The only proof I have ever seen is quite complicated: https://stats.stackexchange.com/questions/97422/probability-distribution-of-functions-of-random-variables – Fabian Werner Dec 17 '20 at 22:07
  • 2
    I do not understand: for me the first part IS the definition of the conditional expectation... – Xi'an Dec 18 '20 at 09:35
  • @Xi'an For me only the last but one line in my original question IS the definition of a discrete conditional expectation. – user852508 Dec 18 '20 at 16:30
  • @user852508: I corrected a typo in your question two days ago, namely that $\varphi(X)$ should be $\varphi(Y)$ and find the typo back. Could you please fix this typo? – Xi'an Dec 21 '20 at 17:30

1 Answers1

6

In the discrete case, when both state spaces $\mathfrak X$ and $\mathfrak Y$ are finite or countable, the notion of almost sure simplifies into sure and any function $f:\mathfrak Y\longmapsto f(\mathfrak Y)\subset\mathbb R$ is measurable wrt the counting measures.

Now, in the discrete case, \begin{align} \mathbb E[Xf(Y)]&=\sum_{x\in\mathfrak X,\,y\in\mathfrak Y}xf(y)\mathbb P(X=x,Y=y)\\ &=\sum_{x\in\mathfrak X,\,y\in\mathfrak Y}xf(y)\mathbb P(X=x|Y=y)\mathbb P(Y=y)\\ &=\sum_{y\in\mathfrak Y}\underbrace{\left\{\sum_{x\in\mathfrak X}x\mathbb P(X=x|Y=y)\right\}}_{\stackrel{\text{(a)}}{=}\,\mathbb E[X|Y=y]}f(y)\mathbb P(Y=y)\\ &=\sum_{y\in\mathfrak Y} \mathbb E[X|Y=y] f(y)\mathbb P(Y=y)\\ &= \mathbb E\{\mathbb E[X|Y] f(Y)\} \end{align} where equality (a) is using the standard definition of the conditional expectation in the discrete case. This is essentially the proof of the Law of Total Expectation with an extra $f(Y)$. And this means that $\mathbb E[Xf(Y)]$ can be written as a particular $\mathbb E[\varphi^0(Y)f(Y)]$, when$$\varphi^0(y)=\mathbb E[X|Y=y]$$

Conversely, if there exists a function $\varphi:\mathfrak Y\longmapsto \text{conv}(\mathfrak X)$ [where $\text{conv}(\mathfrak X)$ denotes the convex envelope of $\mathfrak X$ in order to include all possible combinations of the elements of $\mathfrak X$] such that $$\mathbb E[Xf(Y)] = \mathbb E[\varphi(Y)f(Y)]\tag{1}$$ for every real function $f:\mathfrak Y\longmapsto\mathbb R$, (1) applies to the indicator function $$f_\xi(y)=\mathbb I_{y=\xi}= \begin{cases}1 &\text{ if }y=\xi\\0 &\text{ if }y\ne\xi\\ \end{cases}$$ for every $\xi\in\mathfrak Y$. This leads to $$\mathbb E[X\mathbb I_{Y=\xi}]=\underbrace{\mathbb E[\mathbb E[X|Y=\xi]\mathbb I_{Y=\xi}]}_{\stackrel{\text{(b)}}{=}\, \mathbb E[X|Y=\xi]\mathbb P(Y=\xi)}$$ being equal by (1) to $$\underbrace{\mathbb E[\varphi(\xi)\mathbb I_{Y=\xi}]}_{\stackrel{\text{(c)}}{=}\,\varphi(\xi)\mathbb P(Y=\xi)}$$ where both (b) and (c) are explained by the fact that $$\mathbb E[\mathbb E[X|Y]\mathbb I_{Y=\xi}]=\mathbb E[\underbrace{\mathbb E[X|Y=\xi]}_{\text{constant}}\mathbb I_{Y=\xi}]=\mathbb E[X|Y=\xi]\times\mathbb E[\mathbb I_{Y=\xi}]$$ and $$\mathbb E[\varphi(Y)\mathbb I_{Y=\xi}]=\mathbb E[\varphi(\xi)\mathbb I_{Y=\xi}]=\varphi(\xi)\times\mathbb E[\mathbb I_{Y=\xi}]$$ Hence to$$\varphi(\xi)=\mathbb E[X|Y=\xi]$$for all $\xi$'s such that $\mathbb P(Y=\xi)>0$.

Xi'an
  • 90,397
  • 9
  • 157
  • 575
  • Thank you for your answer but you are too fast for me. What does it follow for my purposes from this:$$=\sum_{y\in\mathfrak Y}\underbrace{\left\{\sum_{x\in\mathfrak X}x\mathbb P(X=x|Y=y)\right\}}_{\mathbb E[X|Y=y]}f(y)\mathbb P(Y=y)$$ (i.e. how do I obtain from this the r.h.s. $\mathbb{E}[\varphi(Y)f(Y)]$ from my OQ) and what does $\text{conv}(\mathfrak X)$ mean ? – user852508 Dec 19 '20 at 16:29
  • Also, how do follow these equalities: $$\mathbb E[X\mathbb I_{Y=\xi}]=\underbrace{\mathbb E[\mathbb E[X|Y=\xi]\mathbb I_{Y=\xi}]}_{= \mathbb E[X|Y=\xi]\mathbb P(Y=\xi)}=\underbrace{\mathbb E[\varphi(\xi)\mathbb I_{Y=\xi}]}_{=\varphi(\xi)\mathbb P(Y=\xi)}$$ – user852508 Dec 19 '20 at 17:49
  • Please, how is defined $$f_{\xi}(y)={\mathbb I}_{Y=\xi}$$ ? – user852508 Dec 20 '20 at 20:43