3

I've already read this: Conditional expectation function

For this question, we assume familiar notation in linear regression, with $Y$ being the response and stochastic regressors $X$. I've seen both $E(Y|X)$ and $E(Y|X=x)$ referred to as the "conditional expectation function".

I understand that $E(Y|X)$ is a random variable while $E(Y|X=x)$ is a realization of $E(Y|X=x)$. That being said, in the regression setup, we view realizations of Y and X, which we denote $(y, x)$.

Therefore, we condition on the fact that $X=x$. Then, we use our data to estimate $E(Y|X=x)$.

My question is when $E(Y|X)$ comes into play at all? Where do we even consider it? Or is it irrelevant, since we've already observed a realization of it?

A great answer would also explain why we call both $E(Y|X)$ and $E(Y|X=x)$ the conditional regression functions, as they seem to be certainly related, yet different objects; one is random while the other is deterministic.

Notice that throughout my answer, we've assumed $X$ is stochastic. Therefore, the question is not a question about stochastic vs fixed regressors.

user303375
  • 583
  • 2
  • 9
  • 2
    Is this question about notation (the two are meaning the same) or about understanding the concept? – Sextus Empiricus Dec 15 '20 at 08:09
  • 1
    I remember recently seeing a question about this different notation. The answer contained something like P(Y|X) being more used for events or categorical variables like P(sick | runny nose) and the notation P(Y|X=x) is used to stress that X is a numerical variable like P(sick | body temperature) – Sextus Empiricus Dec 15 '20 at 08:09
  • I was asking both. If you could share the question you saw in addition to answering the question that would be great too! – user303375 Dec 15 '20 at 08:14
  • In response to the second comment, the distinction is certainly not between categorical and numerical, at least where I've seen it used. Particularly in this link: https://www.timlrx.com/2018/02/26/notes-on-regression-approximation-of-the-conditional-expectation-function/ – user303375 Dec 15 '20 at 08:18
  • The link refers to the conditional expectation function $E(Y|X)$ while other sources refer to the conditional expectation function $E(Y|X=x)$. Clearly, they are related but not "the same object". The former is a random variable while the latter is not. Why are both called the conditional expectation function? Which one is the conditional expectation function, and what is the other one? etc. This is where my confusion lies. – user303375 Dec 15 '20 at 08:20
  • $E(Y|X)$ is not a random variable. It is only a random variable when you estimate it's value based on some sample. – Sextus Empiricus Dec 15 '20 at 09:54
  • Hmmm... so $\beta_0 + \beta_1 X$, where $\beta_0$ and $\beta_1$ are fixed and $X$ is random, is somehow a fixed quantity? Because under the usual regression assumptions, $E(Y | X ) =\beta_0 + \beta_1 X$. – BigBendRegion Dec 15 '20 at 15:46
  • @BigBendRegion $E(Y|X)$ is a function that describes the expectation of $Y$ as a function of $X$. This $X$ can be a variable but the function is fixed. – Sextus Empiricus Dec 15 '20 at 16:11
  • @SextusEmpiricus quite literally, anywhere you look you will see $E(Y|X)$ regarded as a random variable. Take https://stats.stackexchange.com/questions/118578/what-is-the-difference-between-exy-and-exy-y for example. Do you disagree here? If so, I'd like to point out where you disagree. I would like to understand your argument better. – user303375 Dec 15 '20 at 17:08
  • 1
    That link is very nice +1 for Dilip. I do not disagree so much with the explanation there. Indeed $E(Y|X)$ can be a random variable $E(Y|X=x)$ is not. However, I do believe that you can see it more broad. The notation $X=x$ versus $X$ is not just about discriminating between random variables and specific outcomes. – Sextus Empiricus Dec 15 '20 at 17:34

1 Answers1

1

In linear regression, we assume the random variable $E(Y|X)$ has the parametric form $X\beta$. When we "fit" the model, we are estimating the parameter $\beta$, which according to our assumption gives us a general formula for $E(Y|X)$. If we made no assumptions whatsoever about the form of $E(Y|X)$, then it would be true that we would only be able to estimate $E(Y|X=x)$ for our observed realizations $(x,y)$. But that wouldn't be a linear regression model.

The fact that one might refer to both $E(Y|X)$ and $E(Y|X=x)$ by the name "conditional regression function" is just a case of overloading terminology. The first can be thought of as a function of the random variable X, the second can be thought of as a function of the fixed sample $x$. This conflation happens a lot with random variables--one might refer to "the sum of a two rolled dice", which can be interpreted both as a function of random variables (the dice themselves being random), or a function of the fixed outcome of two rolled dice, which is deterministic.

  • Here's my understanding: $E(Y|X)$ appears in the context when $X$ has yet to be observed; i.e. Y = X\beta + \epsilon. In a sense, this is the "unconditional" relationship between $Y$, $X$, and $\epsilon$, as we have not observed $X$ quite yet, so $X$ in addition to $\epsilon$ are still random. $E(Y|X=x)$ appears when $X$ has already been observed as $x$. Therefore, this gives the "conditional on X=x" relationship $Y = x\beta + \epsilon$, where only $\epsilon$ is random. Is this what you were getting at? – user303375 Dec 15 '20 at 18:33
  • 1
    No, I would not refer to $E(Y|X)$ as "unconditional". They both reflect a statistic of $Y$ conditioned on $X$. They differ only in whether we are speaking of this relationship generally, or at one specific value. The unconditional expectation $E(Y)$ is likely completely different from $E(Y|X)$. EDIT: To clarify, $Y$ is a random variable with a distribution. It also has a conditional distribution $Y|X$, which is different unless $Y$ and $X$ are independent. We are interested in calculating a statistic of this conditional distribution. – David Foley Dec 15 '20 at 18:42
  • Ah makes more sense then. Let me rephrase. If we have not yet observed $X$ yet, then, $Y = X\beta + \epsilon = E(Y|X) + \epsilon$ where $X$ and $\epsilon$ are random. Once we observe the data, $Y = x\beta + \epsilon = E(Y|X=x) + \epsilon$, where only $\epsilon$ is random. We mainly operate in the second case because we're in the specific situation where $X=x$. It is only because we are assuming that $E(Y|X)$ and $E(Y|X=x)$ are linear in $\beta$ that we are able to construct estimates for $\beta$, and therefore estimate both functions. I think this is correct? – user303375 Dec 15 '20 at 18:57