Theoretical reason for multiple linear regression predictions being the same when adding and subtracting predictors

Question

Say I have two variables x1 and x2, now I build a linear regression model as below

$\hat{Y}$ = $n_1$$x_1$ + $n_2$$x_2$

Then I build another model as below

$\hat{Z}$ = $m_1$$(x_1 + x_2)$ + $m_2$$(x_1 - x_2)$

Intuitively, $\hat{Y}$ should be equal to $\hat{Z}$.

Below is my R code to demonstrate the equivalence

set.seed(1) 
num = 10
X1 = runif(num)
X2 = runif(num)
Y = runif(num)

mydata <- data.frame(X1, X2, Y)
fit1 = lm(Y ~ X1 + X2, data = mydata)
summary(fit1)

mydata <- data.frame(X1 + X2, X1 - X2, Y)
names(mydata)[1] <- 'new_X1'
names(mydata)[2] <- 'new_X2'

fit2 = lm(Y ~ new_X1 + new_X2, data = mydata)
summary(fit2)

My questions is that how conceptually prove the equivalence?

https://stats.stackexchange.com/questions/540938 is a textbook exercise designed to explore this. For additional approaches see https://stats.stackexchange.com/questions/31858 (a very quick demonstration using matrices) and the remarks at the end of https://stats.stackexchange.com/a/66295/919 (for a geometrical explanation). — whuber, Nov 01 '21 at 21:40

mlofton · Accepted Answer · 2021-11-02T19:05:34.530

Hi: You can prove the equivalence by re-writing your second regression model as

$Z = (m_1 + m_2) \times x_1 + (m_1 - m_2) \times x_2 + \omega$

#==================================================================

EDITED ON 11/01/2021 IN ORDER TO PROVIDE CLARIFICATION BASED ON COMMENT FROM OP THAT AN EXTRA ASSUMPTION IS BEING MADE.

#==================================================================

This way, it looks a lot more like the first regression model

$Y = n_1 \times x_1 + n_2 \times x_2 + \epsilon $

Now, when the estimation of both models is done, $Z$ and $Y$, the respective responses in the two regression models are identical. Also, $n_1$ corresponds to $m_1 + m_2$ and $n_2$ corresponds to $m_1 - m_2$.

So, when the regression using $Z$ is carried out, the coefficient $(\hat{m_1 + m_2})$ is estimated and the coefficient $(\hat{m_1 - m_2})$ is estimated such that the squared deviations of $Z$ from $\hat{Z}$ are minimized.

Similarly, from a least squares standpoint, in the first regression model, one is minimizing the sum of squared deviations of $Y$ from $\hat{Y}$ by finding the coefficient estimates, $\hat{n_1}$ and $\hat{n_2}$.

Therefore, from a system of equations perspective ( taking the derivatives and setting them to zero and all of that ), one has two equations and two unknowns in both cases. Therefore, the result of the minimization procedures have to be identical in that $\hat{n_1}$ has to correspond to $(\hat{m_1 + m_2})$ and $\hat{n_2}$ has to corrrespond to $(\hat{m_1 - m_2})$.

Does that clarify why the two models are identical ? If not, then maybe someone else can give a clearer explanation. In practical terms, the fact that, in the second regression model, the first coefficient is $(m_1 + m_2)$ and the second coefficient is $({m_1 - m_2})$ makes no difference to the minimization algorithm. It views them as variables to be estimated. Since the second regression model faces the same minimization problem that the first regression model faces, the coefficient estimates have to correspond in that the sum of the coefficients in the second model corresponds to $n_1$ and the difference of the coefficients in the second model corresponds to $n_2$.

#=================================================================

EDIT ON 11-02-2021 IN ORDER TO COMMENT ON NICO'S QUESTION REGARDING HOW $m1$ and $m2$ ARE ACTUALLY OBTAINED.

#==================================================================== Nico: Notice that when the R code is run ( output shown below ), the coefficients for the second regression are $m_1$ and $m_2$. So, a system of equations for dealing with the dependence is not necessary because the relation I used to show equivalence of the regression models, is not used in the R code itself. So, my explanation of using a 2 by 2 system of equations to solve for $m_1$ and $m_2$ AFTERWARDS, is only conceptual. The lm call in the second regression model does not need to do anything fancy because there is no dependence between the two coefficients $m_1$ and $m_2$. I just introduced the dependence to show equivalence of the two models. I hope that helps.

Note that if you take the 2 coefficients in the second output and add them and subtract the second from the first, then you will obtain the same coefficients that are obtained in the first regression model.

#====================================================================

Call:
lm(formula = Y ~ X1 + X2, data = mydata)

Residuals:
    Min      1Q  Median      3Q     Max 
-0.3904 -0.2223 -0.0482  0.2495  0.4115 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)  
(Intercept)   0.7706     0.3004   2.566   0.0373 *
X1           -0.2867     0.3326  -0.862   0.4172  
X2           -0.3475     0.3881  -0.895   0.4004  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.3149 on 7 degrees of freedom
Multiple R-squared:  0.1815,    Adjusted R-squared:  -0.05238 
F-statistic: 0.776 on 2 and 7 DF,  p-value: 0.4961

> 
+ mydata <- data.frame(X1 + X2, X1 - X2, Y)
+ names(mydata)[1] <- 'new_X1'
+ names(mydata)[2] <- 'new_X2'
+ 
> 
+ fit2 = lm(Y ~ new_X1 + new_X2, data = mydata)
+ summary(fit2)
+ 
Call:
lm(formula = Y ~ new_X1 + new_X2, data = mydata)

Residuals:
    Min      1Q  Median      3Q     Max 
-0.3904 -0.2223 -0.0482  0.2495  0.4115 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)  
(Intercept)   0.7706     0.3004   2.566   0.0373 *
new_X1       -0.3171     0.2550  -1.244   0.2536  
new_X2        0.0304     0.2562   0.119   0.9089  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.3149 on 7 degrees of freedom
Multiple R-squared:  0.1815,    Adjusted R-squared:  -0.05238 
F-statistic: 0.776 on 2 and 7 DF,  p-value: 0.4961

This is not what I expected. You are making the assumption that Z_hat is equal to Y_hat before you actually prove they are equivalent, which is not true — Bratt Swan, Nov 01 '21 at 03:31
@mlofton nice explanation. There’s still something I can’t understand to close the loop though. In the second model both coefficients (m1 +m2) and (m1 -m2) are dependent. And this isn’t true on the first model where n1 isn’t dependent on n2. Why is the optimization result the same in both cases? — Nico, Nov 02 '21 at 13:17
Thanks Nico. The coefficients are dependent but when the actual minimization is done, it ends up being an equality ( $\hat\beta = (X^\prime X)^{-1} X^\prime Y $) where $X$ is the matrix of dependent variables ( $n \times 2$ forgetting the intercept ) So, its best to think of it as just solving for 2 coefficients, $\beta_1$ and $\beta_2$. Then, AFTER THAT, the relation $\beta_1 = m_1 + m_2$ and $\beta_2 = m_1 - m_2$ can be used to solve for $m_1$ and $m_2$. Essentially, the dependence of the actual coefficients can be handled by solving a 2 by 2 system of equations afterwards. — mlofton, Nov 02 '21 at 18:36
Nico: Note that I'm just explaining it in the way above in order to explain the handling of the dependence. I'm not sure how it's actually done inside the lm call. But atleast the way I explained it is one way to think about it. The people who wrote lm are all R-core members whose brains operate on a whole different level so they may do something totally different from the way I explained it. Either way, the dependence is handled in some manner. — mlofton, Nov 02 '21 at 18:41
Nico: I added another edit in my answer because I realized something after I wrote the comments above. I hope it helps. — mlofton, Nov 02 '21 at 19:04
Thanks @mlofton ! I think I get it! In my head it all boils down to the feasibility of solving the system of equations at the end with n1 and n2 provided. Provided this is solvable (2 unknowns with 2 equations: OK) then the optimization can find any value of the coefficients in the second regression making both regressions equivalent. — Nico, Nov 02 '21 at 23:31
Hi Nico: yes. but the thing is, I just wrote it that way (using m_1 + m_2 and m_1 - m_2) in order to show the model equivalence of the first and second formulations. The actual call to lm in the R code uses the original version of the second formulation that Bratt had. So, in that case, there is nothing to back out afterwards because $m_1$ and $m_2$ are "independent" algebraicall ( but the model is equivalent to the one I wrote where there is dependence ). — mlofton, Nov 03 '21 at 14:14

Theoretical reason for multiple linear regression predictions being the same when adding and subtracting predictors

1 Answers1