4

I came across this blog post about least angle regression, and at a point he says:

  • Find the variable $x_1$ most correlated with the residual. (Note that the variable most correlated with the residual is equivalently the one that makes the least angle with the residual, whence the name.)

  • Move in the direction of this variable until some other variable $x_2$ is just as correlated.

But I couldn't visualize it. By "residual" he is referring to the vectors minus the average of the vectors, am I right? What could be a geometrical interpretation of "variable correlation with the residual"? What would be a, let's say, 2D example of this interpretation?

Lucas Reis
  • 1,962
  • 3
  • 16
  • 15
  • 1
    http://stats.stackexchange.com/a/1448 – whuber Jun 12 '12 at 16:40
  • 2
    see also "13 ways to look at the correlation coefficient" by Rodgers and Nicewander: http://data.psych.udel.edu/laurenceau/PSY861Psychological%20Statistics%20II%20Spring%202010/READINGS/rodgers-nicewander-1988-r-13-ways.pdf – shabbychef Jun 12 '12 at 17:05
  • The original LARS paper has several illustrative figures of the procedure. Have you also looked there? The linked blog post has an adapted version of one of them. – cardinal Jun 12 '12 at 17:06
  • No pictures, but related: http://stats.stackexchange.com/questions/6795/least-angle-regression-keeps-the-correlations-monotonically-decreasing-and-tied/6933#6933 – cardinal Jun 12 '12 at 17:09
  • @shabby: this article is incredible! I really love this kind of reasoning, multiple point of views in the same concept... Fantastic! – Lucas Reis Jun 13 '12 at 03:01

1 Answers1

1

When two variables are highly correlated it means that when you do a scatter plot of the observed pairs they will fall close to a straight line. It is the same here except that one of the variables is a residual. In two dimensions looking at vectors suppose you move from the point (Xmean, Ymean) to a point (Xmean + D, Ymean + E) That vector will have slope E/D. Now if the residual vector at (Xmean + D, Ymean + E) has a change in X and Y similar to the change for the variable X1 being considered it will be highly correlated with X1.

Michael R. Chernick
  • 39,640
  • 28
  • 74
  • 143