Why are the normal equations $X_c(X_c'X_c)^{-1}X_c^Ty$ for centered OLS?

Question

I'm working through a centered OLS problem.

If $X$ has an intercept column, $y = X\beta + \epsilon \Rightarrow y = X_c\beta_c + \gamma_0 + \epsilon$ where $X_c$ is the centered design matrix.

My question is twofold:

Why is it true that $\hat{\gamma_0}=\bar{y}$ and why are the normal equations $(X_c'X_c)\hat{\beta_c}=X_c^Ty$ instead of $(X_c'X_c)\hat{\beta_c}=X_c^T(y-\bar{y})$ for centered OLS?

I would think that if $y-\bar{y} = X_c\beta_c + \epsilon$ were our model (assuming $\hat{\gamma_0}=\bar{y}$), then we have that $X_c'X_c \hat{\beta_c} = X_c'(y-\bar{y})$ as our normal equations.

Where am I going wrong?

angryavian · Accepted Answer · 2018-05-12T17:34:31.150

$\hat{\gamma}_0$ and $\hat{\beta}_c$ are the minimizers of a single optimization problem. If we knew $\hat{\beta}_c$, then $\hat{\gamma}_0$ would be the minimizer of $\gamma \mapsto \|y - X_c \hat{\beta}_c - \gamma\vec{1}\|^2$, which is $\frac{1}{n}\sum_i (y_i - (X_c \hat{\beta}_c)_i)$. But since $X_c$ is centered, this is simply $\bar{y}$. (Note that we did not even need to compute $\hat{\beta}_c$.)

Similarly, if we knew $\hat{\gamma}_0$ then $\hat{\beta}_c$ would be the minimizer of $\beta \mapsto \|y - \hat{\gamma}_0 \vec{1} - X_c \beta\|^2$ which would satisfy the normal equation $X_c^\top (y - \hat{\gamma}_0 \vec{1}) = X_c^\top X_c \beta$. But since $X_c$ is centered, this is $X_c^\top y = X_c^\top X_c \beta$.

Note that the results show that the coefficients are simply obtained by performing least squares with respect to $X_c$ and least squares with respect to $\vec{1}$ separately. This is possible because $\text{colspace}(X_c)$ and $\vec{1}$ are orthogonal. (Think about how one projects a vector $y$ onto a span of orthogonal vectors.)

Why are the normal equations $X_c(X_c'X_c)^{-1}X_c^Ty$ for centered OLS?

1 Answers1

Linked