How do we know adding an uncorrelated variable to a regression will not change existing coefficients?

Question

Say I have a regression with 3 independent variables and I decide to introduce a 4th variable and rerun the regression.

A previous post states that the coefficient on an original variable will change when the new variable is correlated with that original variable AND when the new variable is correlated with the response. (Does adding more variables into a multivariable regression change coefficients of existing variables?)

How do we show that mathematically?

What I know: Let's say the original model was assumed to be $E[Y]=X\beta$ and the new model is assumed to be $E[Y]=X\beta + Z\gamma$, where the new data matrix $(X,Z)$ has full rank. Then the new vector of OLS coefficient estimates is:

$$\begin{bmatrix}\hat{\gamma}\\\hat{\beta}_{NEW}\end{bmatrix}=\begin{bmatrix}(Z^T(I-P_x)Z)^{-1}Z^T(I-P_x)Y\\\hat{\beta}_{OLD}-(X^TX)^{-1}X^TZ\hat{\gamma}\end{bmatrix}$$

where $I-P_x=I-X(X^TX)^{-1}X^T$ is the projection matrix onto the subspace orthogonal to the column space of X. This implies that the original coefficients will not change when either:

The columns of X are orthogonal to the columns of Z ($X^TZ=0$). When there is an intercept, this implies the columns of $Z$ are centered and therefore the columns of X and Z are uncorrelated.
The columns of Z are orthogonal to the residuals from the regression of Y on X. ($Z^T(I-P_x)Y=0$)

But how to you get to the answer linked above?

There is an incredibly simple geometrical proof: because the new variable is orthogonal to the space spanned by the previous variables and the response, its introduction changes nothing, *QED.* See https://stats.stackexchange.com/a/113207/919 for pictures. For algebraic approaches you can use [Gram-Schmidt orthogonalization](https://stats.stackexchange.com/a/352316/919); for more about this, see https://stats.stackexchange.com/a/46508/919, which shows how to reduce the problem from three variables to just one. — whuber, Jan 26 '20 at 20:28
This is very helpful in gaining some intuition about the two conditions I gave, thank you! However, I'm still not clear on how gung's answer in the linked post can be explained by either the geometry you present or the conditions I have given. For example, how do you explain the case when a new variable (a column of $Z$, call it $Z_j$) is not orthogonal, but is uncorrelated to an original variable (a column of $X$, call it $X_i$). gung's answer says the coefficient on $X_i$ would not change in that case. My conditions (and I believe the geometry you present) do not speak to this case. — posteriormean, Jan 28 '20 at 09:59
Geometrically, in a Euclidean space (or even a Hilbert space for that matter) the orthogonal projection of a vector $Y$ on $X$ does not depend on any vectors that might be orthogonal to $X.$ Because at most three dimensions are involved here, you can literally draw a picture. The algebra (if you need it) will echo this intuition. — whuber, Jan 28 '20 at 13:52
But what about when the new variable is NOT orthogonal but is uncorrelated to a variable already in the regression? What math tells us that the original variable's coefficient will not change in this case? — posteriormean, Jan 28 '20 at 20:27
"Orthogonal" and "uncorrelated to a variable" are synonymous when an intercept is included in the regression. — whuber, Jan 28 '20 at 21:04
When all the columns of the data matrix $X$ are orthogonal to the new variable $Z_j$, I agree this implies that the non-intercept columns are uncorrelated with $Z_j$. However, the converse is not true (having uncorrelated variables does not imply orthogonality). So we don't have an iff definition in the presence of an intercept. — posteriormean, Jan 29 '20 at 00:53
I suppose it might come down to what you mean by "in the presence of." When the spaces in question are all orthogonal to the intercept, orthogonality is indeed equivalent to zero correlation. Since that wasn't stipulated by gung in his answer to the thread you reference, why don't you post a comment there asking for clarification? — whuber, Jan 29 '20 at 03:55

How do we know adding an uncorrelated variable to a regression will not change existing coefficients?

0 Answers0

Linked