Say I have a regression with 3 independent variables and I decide to introduce a 4th variable and rerun the regression.
A previous post states that the coefficient on an original variable will change when the new variable is correlated with that original variable AND when the new variable is correlated with the response. (Does adding more variables into a multivariable regression change coefficients of existing variables?)
How do we show that mathematically?
What I know: Let's say the original model was assumed to be $E[Y]=X\beta$ and the new model is assumed to be $E[Y]=X\beta + Z\gamma$, where the new data matrix $(X,Z)$ has full rank. Then the new vector of OLS coefficient estimates is:
$$\begin{bmatrix}\hat{\gamma}\\\hat{\beta}_{NEW}\end{bmatrix}=\begin{bmatrix}(Z^T(I-P_x)Z)^{-1}Z^T(I-P_x)Y\\\hat{\beta}_{OLD}-(X^TX)^{-1}X^TZ\hat{\gamma}\end{bmatrix}$$
where $I-P_x=I-X(X^TX)^{-1}X^T$ is the projection matrix onto the subspace orthogonal to the column space of X. This implies that the original coefficients will not change when either:
- The columns of X are orthogonal to the columns of Z ($X^TZ=0$). When there is an intercept, this implies the columns of $Z$ are centered and therefore the columns of X and Z are uncorrelated.
- The columns of Z are orthogonal to the residuals from the regression of Y on X. ($Z^T(I-P_x)Y=0$)
But how to you get to the answer linked above?