3

Could anybody show me how @Rob Hyndman calculates the variance of $\hat{y}$ in the following link Obtaining a formula for prediction limits in a linear model :

enter image description here

EDIT: Basically I don't understand how come $X^*(X'X)^{-1}X'$ is not squared as well: $Var(y_{pred})=Var(\hat{y}+\epsilon)=(X^*\hat{\beta})^2\sigma^2+\sigma^2$

HeyJane
  • 195
  • 1
  • 2
  • 8
  • 2
    Possible duplicate of [Linear regression prediction interval](https://stats.stackexchange.com/questions/33433/linear-regression-prediction-interval) – DeltaIV Oct 07 '17 at 19:21
  • Thank you for providing that link, but I do not think it is a duplicate. I ask to specifically show or explain how the calculation is done. The answer in the post you link to is very nice indeed, but there is also is lacking the calculations, as I desire. – HeyJane Oct 07 '17 at 19:36

2 Answers2

5

The link you provided has a small typo: "...and its variance by..." The variance of the fitted value is not what that expression is. It's the mean square prediction error (or MSE), which is strictly larger.

If $\operatorname{Var}(\mathbf{e}) = \sigma^2 I$, and $\mathbf{X}$ is of full rank, then \begin{align*} \operatorname{Var}(\hat{y}) &= \operatorname{Var}\left[\mathbf{X}^*\left(\mathbf{X}'\mathbf{X}\right)^{-1}\mathbf{X}'\mathbf{Y}\right] \tag{you}\\ &= \mathbf{X}^*\left(\mathbf{X}'\mathbf{X}\right)^{-1}\mathbf{X}'\operatorname{Var}\left[\mathbf{Y}\right]\left[\mathbf{X}^*\left(\mathbf{X}'\mathbf{X}\right)^{-1}\mathbf{X}'\right]'\tag{me} \\ &= \sigma^2 \mathbf{X}^*\left(\mathbf{X}'\mathbf{X}\right)^{-1}\mathbf{X}'\mathbf{X}\left(\mathbf{X}'\mathbf{X}\right)^{-1} \mathbf{X}^{*'} \tag{*}\\ &= \sigma^2 \mathbf{X}^*\left(\mathbf{X}'\mathbf{X}\right)^{-1} \mathbf{X}^{*'} . \end{align*} Also, $$ \operatorname{MSPE}(y_{\text{pred}}) = \operatorname{Var}(y_{\text{pred}} - \hat{y}) = \operatorname{Var}(y_{\text{pred}}) + \operatorname{Var}(\hat{y}) $$ because of independence (or uncorrelated-ness) of future data, and how the prediction errors have mean zero.

Taylor
  • 18,278
  • 2
  • 31
  • 66
  • Is this not the same as what I've written above? That they should be squared? – HeyJane Oct 07 '17 at 19:38
  • @HeyJane see update. I had actually made a mistake. – Taylor Oct 07 '17 at 19:54
  • thanks for this. I did not see the third step happening before now: the cancellation "effect",...Also, should you not "unbold" Y? – HeyJane Oct 07 '17 at 19:59
  • @HeyJane I don't think so: $\mathbf{Y}$ is the big column vector of all your data you use to estimate $\hat{\beta}$. $\hat{y}$ is the scalar fitted value corresponding to the row vector of new data $\mathbf{X}^*$, and $y_{\text{pred}}$ is whatever the unseen scalar forecasted data comes out to be, not included in $\mathbf{Y}$. Because $X$ and $\beta$ are usually assumed constant, $\operatorname{Var}(\mathbf{Y}) = \operatorname{Var}(\mathbf{e}) = \sigma^2 I$. – Taylor Oct 07 '17 at 20:02
  • 1
    of course, you are correct, sorry! – HeyJane Oct 07 '17 at 20:03
3

I've always found a mystery, what seems a first glance a pretty simple equation, I know this is an old post (2017 Edited, 2020 New Comment), but I'd like to add my take to explaining the justification for the equations for the LM variance.

Explanation below:

Let the linear model of the type be $Y = X\beta + \epsilon_{model}$, where the OLS minimization for the weights can be obtained as: $$\hat{\beta\;} = (X^{T}X)^{-1}X^{T}Y$$

Thus the fitted values for the model could be obtained as: $$Y_{\text{fit}} = X\hat{\beta\;} = X(X^{T}X)^{-1}X^{T}Y$$

Therefore the explained variable $Y = Y_{\text{fit}} + \epsilon$ and the variance of the variable will be: $$\text{Var}(Y) = \text{Var}(Y_{\text{fit}} + \epsilon)$$

If we assume the fitted values mustn't have any correlation with the errors (They are unexplained variance), then the following result applies $\text{Covar}(Y_{\text{fit}},\epsilon) = 0$, therefore: $$\text{Var}(Y) = \text{Var}(Y_{\text{fit}} + \epsilon) = \text{Var}(Y_{\text{fit}}) + \text{Var}(\epsilon)$$

And given the "fitted values", aren't a random variable, their variance will be zero $\text{Var}(Y_{\text{fit}}) = 0$, thus: $$\text{Var}(Y) = \text{Var}(\epsilon) = \sigma_{res}^{2}I$$

Assuming there is uncertainty in the predicted coefficients, what would be the uncertainty in the prediction? (again assuming no correlation between predictions and errors). $$\text{Var}(y_{pred}) = \text{Var}(\hat{y} + \epsilon) = \text{Var}(\hat{y}) + \text{Var}(\epsilon)$$

From previous deduction we know $\text{Var}(\epsilon) = \sigma_{res}^{2}I$, so it only remains to assess: $$\text{Var}(\hat{y}) = E[\hat{y}\,\hat{y}^{T}] - E[\hat{y}]E[\hat{y}^{T}]$$

We've defined $$\hat{y} = X\hat{\beta\;}$$

Therefore, \begin{align} \text{Var}(\hat{y}) &= E[X\hat{\beta\;}\hat{\beta\;}^{T}X^{T}] - E[X\hat{\beta\;}]E[\hat{\beta\;}^{T}X^{T}]\\ & = X\,E[\hat{\beta\;}\hat{\beta\;}^{T}]\,X^{T} - X\,E[\hat{\beta\;}]E[\hat{\beta\;}^{T}]\,X^{T}\\ & = X\,(E[\hat{\beta\;}\hat{\beta\;}^{T}] - E[\hat{\beta\;}]E[\hat{\beta\;}^{T}])\,X^{T}\\ & = X\,\text{Var}(\hat{\beta\;})\,X^{T}\\ \\ \text{Var}(\hat{\beta\;}) &= E[\hat{\beta\;}\hat{\beta\;}^{T}] - E[\hat{\beta\;}]E[\hat{\beta\;}^{T}]\\ &=E[(X^{T}\,X)^{-1}\,X^{T}\,Y\,Y^{T}\,X\,(X\,X^{T})^{-1}] - E[(X^{T}\,X)^{-1}\,X^{T}\,Y]E[Y^{T}\,X\,(X\,X^{T})^{-1}]\\ &=(X^{T}\,X)^{-1}\,X^{T}\,(E[Y\,Y^{T}] - E[Y]E[Y^{T}])\,X\,(X\,X^{T})^{-1}\\ &=(X^{T}\,X)^{-1}\,X^{T}\,\text{Var}(Y)\,X\,(X\,X^{T})^{-1}\\ &=(X^{T}\,X)^{-1}\,X^{T}\, \sigma_{res}^{2}I\,X\,(X\,X^{T})^{-1}\\ &=\sigma_{res}^{2}(X^{T}\,X)^{-1}\,X^{T}\,X\,(X\,X^{T})^{-1}\\ &=\sigma_{res}^{2}\,(X\,X^{T})^{-1}\\ \\ \text{Var}(\hat{y}) & = X\,\text{Var}(\hat{\beta\;})\,X^{T}\\ & = X\,\sigma_{res}^{2}\,(X\,X^{T})^{-1}\,X^{T}\\ & = \sigma_{res}^{2}X\,(X\,X^{T})^{-1}\,X^{T}\\ \end{align}

Consequently: \begin{align} \text{Var}(y_{pred}) &= \text{Var}(\hat{y}) + \text{Var}(\epsilon)\\ &= \sigma_{res}^{2}X\,(X\,X^{T})^{-1}\,X^{T} + \sigma_{res}^{2}I\\ &= \sigma_{res}^{2}(X\,(X\,X^{T})^{-1}\,X^{T} + I) \end{align}

And for future predicted values: $$\text{Var}(y_{fcast}) = \sigma_{res}^{2}(X^{*}\,(X\,X^{T})^{-1}\,X^{*T} + I)$$

Supporting Results: \begin{align} \text{Var}(X) &= E[XX^{T}] - E[X]^{2}\\ \text{Covar}(X,Y) &= E[XY^{T}] - E[X]E[Y]\\ \text{Var}(X \pm Y)&= \text{Var}(X) +\text{Var}(Y) \pm2\text{Covar}(X,Y)\\ \text{Corr}(X,Y) &= \dfrac{\text{Covar}(X,Y)}{\sigma_{X}\sigma_{Y}} \end{align}
\begin{align} \text{Var}(X+Y) &= E[((X+Y) - E[(X+Y)])^{2}]\\ &= E[((X+Y) - E[(X+Y)]) ((X+Y) - E[(X+Y)])^{T}]\\ &= E[((X+Y) - E[X]-E[Y])((X+Y)^{T} - E[X]^{T}-E[Y]^{T})]\\ &= E[(X +Y -\bar{X} -\bar{Y})(X^{T}+Y^{T} - \bar{X}^{T}-\bar{Y}^{T})]\\ &= E[XX^{T}+XY^{T} - X\bar{X}^{T}-X\bar{Y}^{T} + YX^{T}+YY^{T} - Y\bar{X}^{T}-Y\bar{Y}^{T} \\ & -\bar{X}X^{T}-\bar{X}Y^{T} +\bar{X}\bar{X}^{T} +\bar{X}\bar{Y}^{T}-\bar{Y}X^{T}-\bar{Y}Y^{T} +\bar{Y}\bar{X}^{T} +\bar{Y}\bar{Y}^{T}]\\ &= E[XX^{T}]+E[XY^{T}] - \bar{X}^{2}-\bar{X}\bar{Y} + E[YX^{T}]+E[YY^{T}] - \bar{Y}\bar{X}-\bar{Y}^{2} \\ & -\bar{X}^{2}-\bar{X}\bar{Y} +\bar{X}^{2} +\bar{X}\bar{Y}-\bar{Y}\bar{X}-\bar{Y}^{2} +\bar{Y}\bar{X} +\bar{Y}^{2}\\ &= E[XX^{T}] - \bar{X}^{2} +E[YY^{T}]-\bar{Y}^{2} +2(E[XY^{T}] -\bar{X}\bar{Y})\\ &= \text{Var}(X) +\text{Var}(Y) +2\text{Covar}(X,Y) \end{align}
Supporting Algebra results: $$X\,A\,X^{T} \pm X\,B\,X^{T} = X\,(A\pm B)\,X^{T} $$

Alberto GR
  • 31
  • 2
  • Interesting. Is there any specific literature you are referring to in your answer? – shenflow Nov 23 '20 at 07:55
  • Not really, @shenflow, you can find plenty of pieces on the web for the different components, as some examples on what I used while doing my research https://otexts.com/fpp2/regression-matrices.html or http://dept.stat.lsa.umich.edu/~kshedden/Courses/Stat401/Notes/401-slr-slides.pdf, none of the documentation directly pointed to that explanation that I wanted so I pretty much solved this one by hand on my own, which confirms most of the results you see in literature about OLS variance. – Alberto GR Nov 25 '20 at 20:47
  • @AlbertoGR It is confusing to write $Y=X\beta+\epsilon$ and $Y=Y_{\text{fit}}+\epsilon$. We have the errors of the model in the former equation but we have the residuals of the model in the latter equation and they are not the same thing. – Cm7F7Bb Nov 24 '21 at 10:14
  • @Cm7F7Bb, I can see where you are coming from, I can update the model errors to be $\epsilon_{model}$ and keep $\epsilon$ as the residuals errors, thanks for picking this up – Alberto GR Nov 30 '21 at 07:21