What is the distribution of the conditional mean E(Y|X) in a multiple regression?

Question

Suppose the model is $$ Y = b_0 + b_1X_1 + b_2X_2 + b_3D + b_4X_1D + e \\ e \sim\mathcal N(0, \sigma^2) $$ Where $D$ is a categorical variable. $$ E(Y|X_1, X_2, D=1) \sim\mathcal ?? \\ E(Y|X_1, X_2, D=0) \sim\mathcal ?? $$ I want the sampling distribution that incorporates the uncertainty in estimated of $b$ and also leverage in $X$, and I'm guessing something to do with how many observations are in the group $D$.

Nevertheless, some assumption or other is necessary to compute what you asked for. — Glen_b, Oct 31 '14 at 01:20
If you want to incorporate uncertainty in estimated regression coefficients and leverage, you will probably end up with a scaled and shifted $t$ distribution rather than the normal. — StasK, Oct 31 '14 at 02:12
Sounds like no one knows the answer? This seems like a pretty basic question to ask, is it not? @StasK, what do you mean, "If you want to"? You have to follow the assumptions, this isn't a matter of preferences. The model has the standard uncertainty of a E(Y|X) as in the basic OLS model. Anyway, Glen, what is the answer then? Seems like no one knows? Honestly this seems like a basic question unless I'm mistaken. I can find the answer for OLS but not for Multiple regression. — wolfsatthedoor, Oct 31 '14 at 02:23
This should be in any decent regression textbook. The OLS estimator is unbiased, so if e is normal, the sampling distributions of the means will be normal and centered on $0$. You only need the variance (which unfortunately I don't remember, & I don't have my reg text nearby). Based on my answer [here](http://stats.stackexchange.com/a/33642/7290), I suspect you could subtract the estimated error variance from the RHS of the last equation. — gung - Reinstate Monica, Oct 31 '14 at 02:55
This is not a prediction interval. I am asking for the distribution of E[Y|X] not Y|X though. Also, yes, you would think it would be in any textbook but I looked through several and could not find it. — wolfsatthedoor, Oct 31 '14 at 02:56
Also, your answer gung is for simple OLS, for multiple regression with interaction I imagine there might be some extra terms in there. Any advice on how to go about deriving it? I guess it is the distribution of sums of t-distributions and a normal, right? What is the distribution of such a RV? Need convolutions? — wolfsatthedoor, Oct 31 '14 at 02:59
I suspect nobody is yet sure what you are asking. The notation suggests both the $X_i$ and $D$ are random variables but the model explicitly makes only $e$ random. Please edit the question to clarify the meanings of your notation. — whuber, Oct 31 '14 at 13:37
@robbieboy74 this isn't a prediction interval; it's a confidence interval. — shadowtalker, Oct 31 '14 at 17:24

Alecos Papadopoulos · Answer 1 · 2014-10-31T20:18:46.803

Assuming correct specification, $$E(Y|X_1, X_2, D=1) = b_0 + b_1X_1 + b_2X_2 + b_3 + b_4X_1 + E(e|X_1, X_2, D=1)$$

and under the benchmark assumption of strict exogeneity of regressors with respect to the error term,

$$E(Y\mid X_1, X_2, D=1) = (b_0 + b_3) +(b_1+b_4)X_1 + b_2X_2 $$

or more compactly, setting

$$\mathbf \gamma = (\gamma_0, \gamma_1, \gamma_2)',\;\; \gamma_0= b_0 + b_3,\;\;\gamma_1= b_1 + b_4,\;\;\gamma_2= b_2 $$

and

$$Z = (1, X_1, X_2)'$$

$$\Rightarrow E(Y\mid Z, D=1) = Z'\mathbf \gamma $$

and analogously for the other case. Viewed as a random variable, this conditional expectation is a linear combination of $X_1$ and $X_2$ and so in order to discuss its distribution, we have to know or make assumptions on the distribution and dependence structure of the $X$-regressors, something that in many cases, it is not done. Note that $D$ plays no role since it is used in the conditioning fixed to a specific value.

Assume now that we want to consider another random variable, the estimated by method of moments (here, OLS) conditional expectation based on a sample of size $n$ which is treated as fixed. Here the $D$ variable will play a role since it will be used for the estimation of the parameters of he vector $\beta$. Denote $W= (1, X_1, X_2, D, X_1D)'$ and $\mathbf W_n$ the corresponding sample regressor matrix.
We estimate the original model, we obtain $\hat \beta$ from which we obtain $\hat \gamma$. Then

$$\hat E_n(Y\mid W, D=1, \mathbf W_n) = W'|_{D=1}\hat \beta = Z'\gamma +W'|_{D=1}\left(\mathbf W_n'\mathbf W_n\right)^{-1}\mathbf W_n'\mathbf e_n$$

Compacting, $$\left(\mathbf W_n'\mathbf W_n\right)^{-1}\mathbf W_n'\mathbf e_n = \mathbf u_n $$

So we can write

$$\hat E_n(Y\mid W, D=1, \mathbf W_n) = E(Y\mid Z, D=1)+W'|_{D=1}\mathbf u_n$$

which looks like it may have an even more complicated distribution, since here we have also products of random variables.

It doesn't look like a basic question to me. But is this what the OP had in mind? In any case, the treatment here is consistent with the notation used in the question.

score 2 · Answer 2 · answered Oct 31 '14 at 18:44

Let $\left(a, b \right)$ denote the column vector $\left[\matrix{a & b}\right]^T$.

Assume that there exists a $k$-dimensional vector of "true" parameter values $b$, and that the "true" data generating process is described by $y = xb + e$ for any $k$-dimensional row vector $x$, where $e \sim \mathcal{N}{\left( 0, \sigma^2 \right)}$.

Suppose we used OLS to estimate the model parameters with $n$ data points "stacked" to form $k \times n$ matrix $X = \left(\matrix{x^1, \dots, x^n }\right)$. Denote the corresponding parameter estimates with $\hat{b}$.

If we observe some $x$ and denote $\widehat{\mathbb{E}{\left( y\ |\ z \right)}} = \hat{y}$ then $$ \mathbb{E}{\left( \hat{y} \right)} = x \cdot \hat{b}\\ \mathbb{V}\left( \hat{y} \right) = \sqrt{\sigma^2x^T(X^TX)^{-1}x} $$

Then for some $\alpha \in (0,1)$, the interval $$ \hat{y} \pm \left(F^t_{n-(k+1)}\right)^{-1}{\left(\frac{\alpha}{2}\right)}\sqrt{s^2x^T(X^TX)^{-1}x} $$

contains the true $\mathbb{E}{(y\ |\ z)}$ with probability $1-a$, where $\left(F^t_{n-(k+1)}\right)^{-1}$ is the inverse CDF of the $t$ distribution with $n-(k+1)$ degrees of freedom.

This is laid out, with slightly different notation, in:

Mendenhall, William and Terry Sincich. (2012). Appendix B: the mechanics of multiple regression analysis. In A Second Course in Statistics: Regression Analysis, seventh edition (pp. 742-744). Prentice Hall.

In your case $$ x = \left[\matrix{1 & x_1 & x_2 & d & x_1d}\right] \\ b = \left(b_0, b_1, b_2, b_3, b_4\right) \\ \hat{y} = \mathbb{E}{\left( \hat{b_0} + \hat{b_1}x_1 + \hat{b_2}x_2 + \hat{b_3}d + \hat{b_4}x_1d \right)} $$ but that all just plugs into the math above. So no, the interaction doesn't make a difference.

score 2 · Accepted Answer · answered Oct 31 '14 at 21:09

A confusing thing is determining what is random in the expression $E(Y|X)$, but assuming that what you want is $$X\hat \beta,$$ then $\hat \beta$ is the only random part. Now of course if we estimate $\hat \beta$ on data $y = Z \beta + \epsilon$, $\epsilon \sim N(0, \sigma^2)$ then $$ \begin{align*} \hat \beta & = (Z^T Z)^{-1}Z^T y\\ & = (Z^T Z)^{-1}Z^T (Z \beta + \epsilon) \\ & = \beta + (Z^T Z)^{-1}Z^T \epsilon \end{align*} $$ so if the variance $\sigma^2$ were known, you'd have that simply that $$ \begin{align*} X \hat \beta & = X\beta + X (Z^T Z)^{-1}Z^T \epsilon\\ &= X \beta + v \end{align*} $$ where $v \sim N(0, \Lambda)$ for $\Lambda = \sigma^2 X^T (Z^T Z)^{-T}(Z^T Z) (Z^T Z)^{-1} X = \sigma^2 X^T(Z^T Z)^{-1}X $. So if you knew the variance $\sigma^2$, then sampling distribution is Normal, with the mean that you expect, and a variance that depends on how much information was in the design for the value you are interested in extrapolating for.

But since you don't know $\sigma^2$, you will use the consistent estimator $\hat {s^2} = \frac{1}{n-p}\sum (y - Z \hat \beta)^2$, which we know is scale chi-square with $n-p$ dof, and independent of $\hat \beta$. In this case, $$ \frac{v}{\hat s^2} \sim \left(X^T(Z^T Z)^{-1}X \right) t_{n-p}, $$ where $t_{n-p}$ is a t-distribution with $n-p$ dof.

So finally, $$ \frac{\sigma^2}{\hat{s^2}} X \hat \beta \sim X \beta + \sigma^2 \left(X^T(Z^T Z)^{-1}X \right) t_{n-p} $$

What is the distribution of the conditional mean E(Y|X) in a multiple regression?

3 Answers3