2

I'm very impressed by this plot: Why does ridge estimate become better than OLS by adding a constant to the diagonal?

Does someone has any clue about how to plot this on R? I mean, how to get RSS values for differents parameter estimation? With lm function, I will have only one solution!

Thanks for your help!

PLOTZ
  • 301
  • 2
  • 5

1 Answers1

1

It's a plot of the objective function (i.e. the sum of squares, the thing you're trying to minimize with OLS) as a function of the coefficients. One way to compute this is to generate a "grid" of possible coefficients and then compute the sum of squares at each point on the grid.

That is, we fit a regression by finding $\hat\beta \equiv \arg \min_\beta \{ y-x\beta \}$, i.e. the $\hat\beta$ that minimizes $SS_{reg}$. But we can compute $y-x\beta$ for any $\beta$ value. So we can generate a bunch of different $\beta$ values and just plug each one into that $SS_{reg}$ formula. In this case, $\beta$ has two components (i.e. the intercept and slope), so you can visualize all of the possible $SS_{reg}$s as a 3D surface.

As for generating the plot itself, look here for a complete example with R code.

shadowtalker
  • 11,395
  • 3
  • 49
  • 109
  • Thanks for this answer, very clear! But I have a concern, how do I generate the different value of β? I cannot use the lm function, can I? – PLOTZ Oct 17 '14 at 06:22
  • 1
    PLOTZ - you choose a grid of values for the betas and evaluate $\text{SS}_\text{reg}$ at each grid point. Which values are on your grid depends on which parts of the function you want to look at, but to include the minimum it should include the least squares estimate, probably near the center of the grid. IIRC I used `outer` for this, but `expand.grid` would be another good choice. – Glen_b Oct 17 '14 at 09:59
  • 1
    In order to make the sum of squares have the long valley, I had the data in $(x_1,x_2)$ space lie scattered about a straight line. Because many planes pass through the line, there's not much information in the data about which plane it should be (hence the valley - there's many combinations of $(\hat{\beta_1},\hat{\beta_2})$ with nearly the same fit). – Glen_b Oct 17 '14 at 10:49
  • Thank you Glen. Let me try this, and maybe I'll get back to you if I am stuck. I am just à bit disapointed because if I understand well, thetas do not come from a real solving of regression vs. Ridge regression. Because ideally, I would like to do the same kind of plot for the lasso (l1 norm), and I don't how the thetas should look like ! – PLOTZ Oct 17 '14 at 21:06
  • @PLOTZ it would be trivial to make a plot for LASSO instead of Ridge. What thetas are you talking about? – shadowtalker Oct 17 '14 at 21:09
  • Sorry, I meant betas (model's parameter) – PLOTZ Oct 17 '14 at 21:11
  • Question : why trivial ? Wouldn't we see truncated parameters' values with the lasso? – PLOTZ Oct 18 '14 at 22:01
  • It should just yield a different shape around the origin. – shadowtalker Oct 19 '14 at 21:52
  • Yes, and that's the shape I would like to visualize! :-) – PLOTZ Oct 20 '14 at 13:34
  • That's my point. Plug the beta grid values into the formula for RSS and plot it. The only difference between lasso and ridge is the norm (L1 for lasso and L2 for ridge) used in RSS; the formula is otherwise identical – shadowtalker Oct 20 '14 at 14:10
  • oh... I'm very sorry, I think I am completely lost... Let me try to figure out the simulation process: 1/ random generation of x's and y values 2/ random generation of betas values 3/ compute the residual sum of squares. But L1/L2 deal with betas computation, not RSS, doesn't it? So how taking L1/L2 into account if the values are simulated... Maybe I'm asking too much, but could you show me some pieces of R codes useful to get the data? Thank you very much, and sorry for asking this. – PLOTZ Oct 21 '14 at 05:55
  • any help please? – PLOTZ Oct 24 '14 at 05:36