6

I was asked this in an interview. You have two features, $x_1$ and $x_2$. You fit a simple linear model on each feature, so

$$ \underbrace{y = x_1 \beta}_{\text{model 1}}, \qquad \underbrace{y = x_2 \beta}_{\text{model 2}}. $$

You compute the $R^2$ value for each model and find that each one has an $R^2$ of $0.1$. Now you fit a model on both features

$$ y = x_1 \beta_1 + x_2 \beta_2. $$

What range of values can you expect this model's $R^2$ to take? I was stumped and am curious how to reason through this.

jds
  • 1,402
  • 1
  • 13
  • 24
  • Between 0.1 and 0.2 – user158565 Dec 31 '19 at 17:24
  • 1
    Please add the `[self-study]` tag & read its [wiki](https://stats.stackexchange.com/tags/self-study/info). Can you find a lower bound to the $R^2$ of the larger model? – Stephan Kolassa Dec 31 '19 at 17:25
  • 1
    @user158565 please give an explanation of how how got that answer. – Dave Dec 31 '19 at 17:30
  • Is it important that both models have no intercept (or the same intercept)? – Sal Mangiafico Dec 31 '19 at 18:18
  • 2
    @user158565 To see why that answer might be wrong, consider the $(x_1,x_2,y)$ dataset $(1,2.1,2),$ $(3,6.1,0)$ and bear in mind these models explicitly have no intercepts. I chose these data to conform to the assumptions, but since $X_1$ and $X_2$ are linearly independent, $R^2$ for the full model is $1.$ – whuber Dec 31 '19 at 20:12
  • @whuber The data in your current example is 2 x 3. Could you create one that is 3 x 3? – user158565 Dec 31 '19 at 21:41
  • 1
    @user158565 Sure--but there's no need to, because a counterexample is a counterexample. If you like, just duplicate one of the observations. That will change the individual $R^2$ values, but not by enough to make a difference. – whuber Dec 31 '19 at 22:28

1 Answers1

1

A simple approach to this problem is considering it from a geometrical view of point.

Firstly, we immediately know answer is within range [0.1, 1], then let's check if it's tight.

Note regression is projection of $y$ on to column space of $x$.

If vector $x_1$ and $x_2$ are almost perfect linear dependent, it's easy to see the projection of $y$ is almost the same as before, thus $R^2$ is almost 0.1.

If $y$ lies in subspace by two vector $x_1$ and $x_2$(this indeed can happen, you can first think of it in 3-dim space, then it holds for general n-dim space), then we immediately know $R^2$ is 1.

Thus answer should be (0.1, 1].

Hess
  • 36
  • 3