1

I was asked this during an interview, and I'm curious if my thinking is correct.

Fit linear regression twice to two features, $x_1$ and $x_2$. You get two coefficients $\beta_1$ and $\beta_2$, both greater than $1$. Now fit linear regression to both features at the same time. Can either coefficient be negative?

My intuition is that yes, the coefficient sign can flip, if $x_1$ and $x_2$ are collinear. OLS parameter estimates are unstable here since the normal equation requires inverting the Gram matrix $\mathbf{X}^{\top} \mathbf{X}$, which has the same rank as $\mathbf{X}$. (1) Am I correct and (2) if so, is my analysis thorough? Not sure if there's anything else I should consider here or a better way to explain why the coefficients can flip signs.

jds
  • 1,402
  • 1
  • 13
  • 24

1 Answers1

2

Yes, they can flip sign if they are correlated. Arguing this mathematically is likely possible, but we can just demonstrate that this can happen with simulation.



set.seed(0)
# Generate correlated covars 
X = MASS::mvrnorm(100, c(0,0), matrix(c(1, 0.99, 0.99, 1), nrow = 2))
# Use them to generate observations.  Only the first column has effect on y
y = X %*% c(2, 0) + rnorm(100, 0, 0.4)

# Estimate 3 models: 2 with only one variable and 1 with both
m1 = lm(y~X[,1])
coef(m1)
>>> (Intercept)      X[, 1] 
 0.02606534  2.03186570 


m2 = lm(y~X[,2])
coef(m2)
    (Intercept)      X[, 2] 
 >>>0.04038971  1.96816682 

m = lm(y~X)
coeff(m)
>>> Coefficients:
(Intercept)           X1           X2  
    0.02581      2.07047     -0.03831  

```
Demetri Pananos
  • 24,380
  • 1
  • 36
  • 94