92

What is the exact meaning of the subscript notation $\mathbb{E}_X[f(X)]$ in conditional expectations in the framework of measure theory ? These subscripts do not appear in the definition of conditional expectation, but we may see for example in this page of wikipedia. (Note that it wasn't always the case, the same page few months ago).

What should be for example the meaning of $\mathbb{E}_X[X+Y]$ with $X\sim\mathcal{N}(0,1)$ and $Y=X+1$ ?

Emile
  • 3,150
  • 2
  • 20
  • 17
  • 12
    No doubt someone will chime in with formal definitions, informally, all expectations are expectations over the distribution of (/expectation with respect to) some (possibly multivariate) random variable, whether it has been explicitly specified or left implied. In many cases it's obvious ($\text{E}(X)$ implies $\text{E}_X(X)$ rather than $\text{E}_W(X)$). Other times, it's necessary to distinguish; consider the law of total variance for example: $\text{Var}[Y] = \text{E}_X\left[\text{Var}[Y\mid X]\right] + \text{Var}_X\left[\text{E}[Y\mid X]\right]$. – Glen_b Oct 12 '13 at 11:32
  • 4
    @Glen_b Is it really necessary to specify in the law of total variance? As $E[Y|X]=f(X)$, for some $f$, isn't it clear that $\text{Var}[E[Y|X]]$ is over $X$? – Thomas Ahle Oct 18 '15 at 17:32
  • 5
    @ThomasAhle You're quite right -- "necessary" was too strong a word for that example. While strictly speaking it should be clear, it's often a point of confusion for readers unusued to working with it, so it's common, rather than necessary, to be explicit about it. There are some expressions involving expectations where you can't be sure without specifying, but that isn't really one of them – Glen_b Oct 18 '15 at 21:39

2 Answers2

119

In an expression where more than one random variables are involved, the symbol $E$ alone does not clarify with respect to which random variable is the expected value "taken". For example

$$E[h(X,Y)] =\text{?} \int_{-\infty}^{\infty} h(x,y) f_X(x)\,dx$$ or $$E[h(X,Y)] = \text{?} \int_{-\infty}^\infty h(x,y) f_Y(y)\,dy$$

Neither. When many random variables are involved, and there is no subscript in the $E$ symbol, the expected value is taken with respect to their joint distribution:

$$E[h(X,Y)] = \int_{-\infty}^\infty \int_{-\infty}^\infty h(x,y) f_{XY}(x,y) \, dx \, dy$$

When a subscript is present... in some cases it tells us on which variable we should condition. So

$$E_X[h(X,Y)] = E[h(X,Y)\mid X] = \int_{-\infty}^\infty h(x,y) f_{h(X,Y)\mid X}(h(x,y)\mid x)\,dy $$

Here, we "integrate out" the $Y$ variable, and we are left with a function of $X$.

...But in other cases, it tells us which marginal density to use for the "averaging"

$$E_X[h(X,Y)] = \int_{-\infty}^\infty h(x,y) f_{X}(x) \, dx $$

Here, we "average over" the $X$ variable, and we are left with a function of $Y$.

Rather confusing I would say, but who said that scientific notation is totally free of ambiguity or multiple use? You should look how each author defines the use of such symbols.

Alecos Papadopoulos
  • 52,923
  • 5
  • 131
  • 241
  • 5
    I have two questions. 1) Not sure if I understand this properly, can I interpret the expectation as one of the first two equations, if either X or Y has been fixed? 2) Can you give an example for EQ 4 and EQ 5? I have a hard time interpreting them and I think concrete examples would help. Thanks! – ceiling cat Oct 14 '13 at 18:50
  • 2
    @ceiling cat 1) $E[h(X,\bar y)] = \int_{-\infty}^{\infty} h(x,\bar y) f_X(x)dx$ is correct because essentially you do _not_ have _two_ random variables any more. Likewise for fixing $X$ to $\bar x$. – Alecos Papadopoulos Oct 14 '13 at 19:10
  • 4
    @ceiling cat 2)-EQ5 : Consider $Z = X^2(Y-(Y+2)^3) = h(X,Y)$. $Z$ is a random variable alright (for an appropriate support). Then using the specific meaning for the short hand notation, $E_X(Z)=E_X[(h(X,Y)] = \int_{-\infty}^{\infty} x^2(y-(y+2)^2) f_{X}(x)dx$ where $f_{X}(x)$ is the density of $X$ (whatever that is). Obviously $Y$ is not integrated, and it will stay intact. But the result you will obtain won't be a number (as in my previous comment), but a random variable (a function of $Y$), since $Y$ here is _not_ fixed, just not-integrated out. – Alecos Papadopoulos Oct 14 '13 at 19:18
  • 2
    @ceiling cat In both cases in my two previous comments, the "mechanics" of mathematical calculations will be the same. The end results though have different interpretations. – Alecos Papadopoulos Oct 14 '13 at 19:22
  • 2
    @ceiling cat 2)-EQ4: Consider the same random variable $Z$. Its expected value conditional on $X$ is (using the other meaning for the shorthand notation) $E_X[Z] = E(Z\mid X) = \int_{-\infty}^{\infty} z f_{Z|X}(z\mid x)dz$. Note that here the $x$'s and $y$'s do not appear directly in the integrand -they are "condensed" in the $z$ symbol. – Alecos Papadopoulos Oct 14 '13 at 19:29
  • @Alecos $E_X(Z)=E(Z|X)$ seems inconsistent. The LHS integrates $X$ out while the RHS is a function of $X$. – A.S. Jan 24 '16 at 01:30
  • @A.S. I specifically stressed in my answer that the (bad anyway) notation $E_X(Z)$ may have different meanings in the hands of different authors., and I provided two of them that I know of. Your comment stands only if $E_X(Z)$ has a single, universal meaning and use -and it doesn't. Avoid it anyway, if you ask me. – Alecos Papadopoulos Jan 24 '16 at 01:47
  • In light of all this discussion http://www.stat.cmu.edu/~ryantibs/statml/review/modelbasics.pdf, page 2, the second equation (the one after y = f(x) + e), how come there is still an expectation on the right side of the equation? There are not random any variables left. Shouldn't it be simply s^2 + (f(x) - fhat(x))^2 – Cagdas Ozgenc Oct 21 '17 at 13:42
  • @CagdasOzgenc Yes, since I understand that lowercase $x$ is a specific realization and $X$ is a random variable. But this notational distinction is not always observed and I believe the author has in mind the random variable $X$, or $X=x$ but $\forall x$. – Alecos Papadopoulos Oct 21 '17 at 16:06
  • @AlecosPapadopoulos I checked some other derivations of the same thing. It seems they are also averaging over the training data which affects f_hat. But how the hell anybody suppose to know that. I think the subscript should be enforced to full extent in all ecpectations. – Cagdas Ozgenc Oct 21 '17 at 16:27
  • @AlecosPapadopoulos is that last integral the conditional expectation of $h(X,Y)$ with with respect to $Y$ – Colin Hicks May 23 '20 at 03:19
  • 1
    @ColinHicks No. As you can see under the integral, no conditional density appears. The integral before the last, is an example of a conditional expectation. – Alecos Papadopoulos May 23 '20 at 13:35
  • @AlecosPapadopoulos This topic confused me and I asked a [similar questions](https://math.stackexchange.com/questions/3688785/law-of-unconscious-statistician-for-conditional-expectation) where I was told (swapping which variable is conditioned) $E(h(X,Y)|X)=\int_{-\infty}^{\infty}h(x,y)f_{Y|X}(y|x)dy$ I realize you are using non standard notation with $dh$ but is this a contradiction with what I was told or are the integrals somehow "equivalent"? – Colin Hicks May 24 '20 at 17:29
  • @ColinHicks Of course they are not. This is the whole point of the post, to alert that "compact" notations may refer to different things, depending on how each author defines the compact notation. – Alecos Papadopoulos May 24 '20 at 19:25
  • @ColinHicks ...and $dh$ was a typo, unnoticed for so many years.... corrected. Plus some more discussion. – Alecos Papadopoulos May 24 '20 at 19:31
  • @AlecosPapadopoulos ah so $f_{h(X,Y)|X}(h(x,y)|x)$ is the same as $f_{Y|X}(y|x)$ or is this just a case of the two integrals being equal? I’m assuming the latter but also I feel like there could be a simple change of variables that would clear up why both integrals produce the same output. I have realized that I need to learn more basics before getting here as I do not have a rigorous understanding of these constructs which compounds the confusion notationally. I really appreciate your help btw after all these years haha – Colin Hicks May 24 '20 at 21:03
  • 1
    @ColinHicks It cannot be the same. $h(x,y)$ is a bivariate function that can have any form. For example, we could have $h(x,y) =2 + x^2 + xy^3$. Even if we condition on $X$, and so treat the $x$'s in $h(x,y)$ as fixed numbers, evidently the distribution of $h(x,y)$ conditional on $X$ cannot be the same as the distribution of $Y$ conditional on $X$. – Alecos Papadopoulos May 24 '20 at 23:28
  • @AlecosPapadopoulos for a joint distribution of two variables $f_{X,Y}(x,y)$ we have that a conditional distribution and a marginal distribution are different things. But is that the same for expectation values? I have difficulties with interpreting $$E_X[h(X,Y)] = \int_{-\infty}^\infty h(x,y) f_{X}(x) \, dx = E_1(y)$$ for X, Y dependent this is different from $$E_X[h(X,Y)] = \int_{-\infty}^\infty h(x,y) f_{X|Y}(x|y) \, dx =E_2(y)$$ The $E_2(y)$ I can interpret as the average/mean of $h(x,y)$ conditional on $y$. But I can not see what sort of expectation/average $E_1(y)$ is. – Sextus Empiricus May 30 '20 at 20:22
  • Or should this $E_1(y)$ formulation maybe be interpreted in the sense that X and Y are either not jointly distributed variables (e.g. Y is a parameter) or in the sense that X and Y are independent variables in which case $f_X = f_{X|Y}$, ie. marginal and conditional distributions are equal. – Sextus Empiricus May 30 '20 at 20:30
  • Should this $E[h(X,Y)\mid X] $ $$\int_{-\infty}^\infty h(x,y) f_{h(X,Y)\mid X}(h(x,y)\mid x)\,dy$$ not instead be either $$\int_{-\infty}^\infty h(x,y) f_{h(X,Y)\mid X}(h(x,y)\mid x)\,dh(x,y)$$ or $$\int_{-\infty}^\infty h(x,y) f_{Y\mid X}(y\mid x)\,dy$$ – Sextus Empiricus May 30 '20 at 20:34
  • @SextusEmpiricus $E_1 (y)$ is essentially treating treating $h$ as $h(x;y)$ that is, as a function of only $x$ with $y$ as a free parameter which is distinct from conditional expectation. as for the other two integrals in your last comment see [law of unconscious statistician](https://en.wikipedia.org/wiki/Law_of_the_unconscious_statistician) and [this question](https://stats.stackexchange.com/questions/475418/law-of-unconscious-statistician-for-conditional-expectation) – Colin Hicks Aug 13 '20 at 19:56
  • @ColinHicks my problem with the expression for $E_1(y)$ is that it multiplies $h(x;y)$ (an expression whose value depends on $y$) with the marginal $f(x)$ instead of with the conditional $f(x;y)$. I do not understand the meaning of $E_1(y)$. What do you mean that it is distinct from conditional expectation? Is the $y$ in $h(x;y)$ different from the $y$ in the density function $f(x;y)$? – Sextus Empiricus Aug 14 '20 at 06:13
  • 1
    @ColinHicks so for distributions we can have marginal and conditional distributions. I can intuitively understand those things. However, for expectations I do not see how you can make that difference. Well, you can make the difference and express a marginal expectation like $E_1(y)$, but what does that expectation mean? – Sextus Empiricus Aug 14 '20 at 06:18
  • @SextusEmpiricus Like you, I am someone who is very curious and always try to understand every little detail about a subject that I am curious about. If there's one thing I learned while trying improve my knowledge about a subject, it's that "what does it mean" may actually have no well founded answer at all. Especially when it comes to notation and terminology, many times the answer will depend on how you define something to begin with. – Colin Hicks Oct 24 '20 at 21:26
  • @SextusEmpiricus For example, back in second year of engineering in our signal processing class, we used the [dirac delta function](https://en.wikipedia.org/wiki/Dirac_delta_function) like it was a regular function and I always used to ask "what does this mean" when say taking the value of the function at 0. No one could answer question (and in a similar light for step function) because no one defined it to begin with (we had not learned about distributions). Similarily your question about what does this function mean or do will entirely depend on how definition (using sets and measure) – Colin Hicks Oct 24 '20 at 21:30
  • The trouble you are having with grasping "what does it mean" is specifically a terminology issue and how you want to define it. Specifically, the confusion lies with the notation causing confliction in how you're reading this (notation in consistent definitions for probability theory is in my opinion quite a mess). The best way I can explain $E_1(y)$ is that it is to treat $h(x,y)$ as a function of just $x$ first. Obviously, $E_1$ would simply be the expected value then. Now, modify the distribution of $h$ with $y$. This leads to a different expected value hence $E_1$ depends on $y$. – Colin Hicks Oct 24 '20 at 21:36
  • @SextusEmpiricus Sometimes, it's easiest to grasp something by taking it at "facevalue" and in a purely mechanical nature. Any other prescribed meaning will depend entirely on how you want to define it. Maybe this will be the SextusEmpiricus Expecation :). – Colin Hicks Oct 24 '20 at 21:39
  • @Colin for me some mathematical function that is attached with the term 'expectation' needs to relate to some real world process. I would need to be able to get this expectation by simulating this experiment. Indeed, in a mechanical nature. And that is what I claim is missing in the expectation $E_1$. There is no mechanical process that can result in this value $E_1$. (Sure you can interpret it when $X$ and $Y$ are not jointly distributed variables or when they are independent, but then the use of the term 'marginal' density is confusing as it suggest a joint distribution of $X$ and $Y$) – Sextus Empiricus Oct 24 '20 at 22:23
  • Should that say "a function of $x$" rather than $X$? – Mateen Ulhaq Aug 13 '21 at 20:12
3

I just want to add a follow-up to Alecos' great answer. Sometimes it doesn't matter the exact R.V. (or set of RV) the expectation is over. For instance,

$$ E_{X\sim P(X)} [X] = E_{X\sim P(X,Y)}[X] $$

In your particular question, I suspect that because you are given $h(X,Y)$ is linear in X and Y, then you will break it up into the "marginal" expectations $E_X[X]$ and $E_X[Y]$ (and then swap in $Y = X + 1$)