3

In my lecture note, it states that $\hat{y} \sim\mathcal{N}(X\beta, \sigma^{2}X(X^{T}X)^{-1}X^{T}$)

but isn't $y \sim\mathcal{N}(X\beta, \sigma^{2}I_n$) ?

Which one is accurate or are they the same thing but with different representation?

Math Avengers
  • 497
  • 3
  • 7
  • 1
    your question is not clearly stated (the distributional conditions are coherent) could you rephrase what it is you are asking? Reviewing: https://stats.stackexchange.com/help/quality-standards-error and https://stats.stackexchange.com/help/how-to-ask may help. There is also this search: https://stats.stackexchange.com/search?tab=votes&q=assumption%2A%20residual%2A%20normal%2A%20score%3a1%20answers%3a1%20-logistic maybe some of the Q&A there already contains your answer? – Lucas Roberts Oct 02 '18 at 00:27

1 Answers1

2

$\mathbf{y \neq \hat y}$

  • The $\sigma^2(X^TX)^{-1}$ is the covariance table $\text{Cov}(\hat\beta)$ for the estimated coefficients
  • Then $X Cov(\hat\beta) X^T$ is the covariance $\Sigma$ for the error of the estimated values $\hat y = X \hat \beta$

    (that is different from the true sample values $y$).

Intuition

Imagine a linear regression line which is always less accurate at the ends due to the uncertainty in the slope.

enter image description here

Different notation

Possibly it may become more clear when we use different notation (use the $\hat \mu$ instead of $\hat y$)

  • The true mean (conditional on $X$) is $$\mu_X = X\beta$$ the $y$ and $\hat y$ are different derivatives of this.
    • The estimated mean $\hat y$, or the regression line. You could better interpret $\hat y$ as the estimated mean, ie the regression line. Then $$\hat \mu_X \sim \mathcal{N}(\mu, \sigma^2 X (X^TX)^{-1} X^T )$$ is expressing the sample variation of this regression line around $X\beta$ (how your estimate of the conditional mean $\hat \mu_X$ will vary from experiment to experiment, or sample to sample)
    • The sampled data $y$. The interpretation of $y$ and $\hat{y}$ is a bit different. The one is a data point the other is the mean. The sampled data points $y$ are distributed around the true mean $X\beta$ and, if the variance is homoscedastic and normal distributed then it is $$Y \sim \mathcal{N}(\mu_X,\sigma^2 I_n)$$

In the case of prediction, if you would like to estimate an error for the estimate of a new value, then you would actually use the sum of the two variances expressed above (the estimate of the mean plus the 'error' of a sampled value with respect to the mean).

Sextus Empiricus
  • 43,080
  • 1
  • 72
  • 161