0

Possible Duplicate:
R-squared: X “explains” the percentage of variation of the Y values. Does axis order matter?

Excuse me about the strange question but I have a doubt regarding the linear regression. Take a look at the code below:

  x = rnorm(100)
  y = rnorm(100)

  mod1 = lm(x~y+0)
  mod2 = lm(y~x+0)

As You can see I change the regressand and the regressors on those two linear models. My question regarding the residuals of those two models, why they should be different if I use the same variables?

Dail
  • 2,147
  • 12
  • 40
  • 54
  • Answered at http://stats.stackexchange.com/a/18448. – whuber Dec 07 '11 at 16:58
  • There's a good answer at the above link. It may also help you to read my answer here: [what-is-the-difference-between-doing-linear-regression-on-y-with-x-versus-x-with-y?](http://stats.stackexchange.com/questions/22718/22721#22721), which is explained conceptually / graphically instead of mathematically. – gung - Reinstate Monica Aug 13 '12 at 14:32

1 Answers1

4

No, the two models are not the same. In linear regression you are trying to find the best fit line that minimizes the squared difference between the actual $y$ values and the estimates that result from assuming that $y$ can be approximated by $x\beta$.

To see how the above translates into a graphical picture- imagine that traditional plot of $y$ vs $x$ (i.e., $y$ values are plotted on the Y-axis against $x$ values on the X-axis). Using the above graph as a visual aid, we can see that linear regression of $y=x\beta + \epsilon$ amounts to minimizing the vertical error between the estimate provided by line and the actual value we have at hand.

In contrast, the linear regression of $x=y\alpha+\epsilon$ (i.e., when you switch the role of $x$ and $y$) amounts to minimizing the horizontal error between the estimate provided by the line and the actual value at hand.

Thus, in general we get different slopes (i.e., different lines) and hence different residuals.

varty
  • 1,276
  • 8
  • 6