Ridge regression via OLS using row augmentation

Question

I figured out what was wrong: I wasn't centering the date like the lm.ridge function does. However I still cannot reproduce the intercept that lm.ridge() gives me.

According to my research you can simulate a ridge regression by adding "phony data" to the end of a normal OLS regression, i.e., by augmenting the rows of the covariate matrix with a diagonal matrix with sqrt(lambda) along the diagonal. One of many places that corroborate this notion is the CV thread: Ridge penalized GLMs using row augmentation?

However I fail to replicate the results in R. Here are my three variables:

> test_0
12    34    24    64   746    24    23    42     7     8     3     4    45   675     3     4    34    43  56   674     3     4    54    34    23    34   435    56    56   234   657    89   980     8    76    65 45564    67    76   789

> test_1
34    24    64   746    24    23    42     7     8     3     4    45   675     3     4    34    43    56 674     3     4    54    34    23    34   435    56    56   234   657    89   980     8    76    65 45564  67    76   789     6

> test_2
24    64   746    24    23    42     7     8     3     4    45   675     3     4    34    43    56   674 3     4    54    34    23    34   435    56    56   234  657    89   980     8    76    65 45564    67 76   789     6     5

I then append 2 new rows (for the number of independent vars). To test_0, I append two zeros. To test_1, I append a sqrt(.5) and 0. To test_2, I append a 0 and sqrt(.5):

a = c(test_0, 0, 0)
b = c(test_1, (sqrt(.5)), 0)
c = c(test_2, 0, (sqrt(.5)))

Then I run two models, lm and lm.ridge:

reg = lm(a~b+c)
ridge = lm.ridge(test_0~test_1+test_2, lambda=.5)
# reg
# Call:
# lm(formula = a ~ b + c)
# 
# Coefficients:
# (Intercept)            b            c  
#  1305.42310     -0.02926     -0.02862  

ridge
#                      test_1        test_2 
# 1374.16801379   -0.03059968   -0.02996396

The coefficients are different but they should be the same (I have also tried the above using a lambda of 1 and still get the inconsistency). Why is this the case?

Okay so I figured out what was wrong! I wasn't centering the date like the lm.ridge function does. However I still cannot reproduce the intercept that lm.ridge gives me. — Stevens, Sep 18 '15 at 18:11
You can post this as an answer to help future users. They may even upvote your answer to allow you to accrue reputation. — Sycorax, Sep 18 '15 at 19:48
You should mention in your post what package `lm.ridge` is from — Glen_b, Sep 19 '15 at 00:18
Added an answer! It's got to do with the fact that their objective function has an extra 1/n factor, which means lambda has to be scaled by a factor n... Also you have to recall not to penalize the intercept. — Tom Wenseleers, Aug 30 '19 at 09:22

score 2 · Answer 1 · answered Sep 18 '15 at 19:55

The reason my values are off was because the the regular ridge regression method assumes you centralize (standardize) your data in the y vector and X matrix.

If you centralize and then run an OLS with the "phoney data" added in, you get the Ridge Regression betas. The intercept is another story, which I haven't figured out.

Tom Wenseleers · Answer 2 · 2019-08-30T14:28:55.510

Here is some code showing that you get identical ridge regression coefficients with the row augmentation method (as well as via a much faster solve method) than with the output of the ridge and penalized packages. All R implementations of ridge regression seem to assume you have an intercept in your model and that this intercept is not penalized. Also, the ridge and penalized packages have an extra 1/n factor in the objective function, which means that you have to multiply the lambda you use there by n to get the same result. And some ridge implementations also normalize your covariate matrix and/or dependent variable - below I disabled that.

In R code for your example with some timings:

y=c(12,34,24,64,746,24,23,42,7,8,3,4,45,675,3,4,34,43,56,674,3,4,54,34,23,34,435,56,56,234,657,89,980,8,76,65,45564,67,76,789)
x1=c(34,24,64,746,24,23,42,7,8,3,4,45,675,3,4,34,43,56,674,3,4,54,34,23,34,435,56,56,234,657,89,980,8,76,65,45564,67,76,789,6)
x2=c(24,64,746,24,23,42,7,8,3,4,45,675,3,4,34,43,56,674,3,4,54,34,23,34,435,56,56,234,657,89,980,8,76,65,45564,67,76,789,6,5)
X=cbind(x1,x2)
# you can add X=scale(X) to pre-normalize your covariate matrix first if you like
# or center & divide by the L2 norm via X=apply(X,2,function(col) (col-mean(col))/norm(col-mean(col),"2"))
lambda=0.5
n=nrow(X)
p=ncol(X)

lmridge_rbind = function (X, y, lambda, intercept = TRUE) {
  if (intercept) {
    lambdas = c(0, rep(lambda, ncol(X)))
    X = cbind(1, X)
  } else { lambdas = rep(lambda, ncol(X)) }
  qr.solve(rbind(X, diag(sqrt(lambdas))), c(y, rep(0, ncol(X))))
}
lmridge_solve = function (X, y, lambda, intercept = TRUE) {
  if (intercept) {
    lambdas = c(0, rep(lambda, ncol(X)))
    X = cbind(1, X)
  } else { lambdas = rep(lambda, ncol(X)) }
  solve(crossprod(X) + diag(lambdas), crossprod(X, y))[, 1]
}

library(ridge)
library(penalized)

library(microbenchmark)
microbenchmark("ridge::linearRidge" = { ridge <- coef(ridge::linearRidge(y~1+x1+x2, lambda=lambda*n, scaling="none")) },
               "penalized::penalized" = { ridge_penalized <- coef(penalized::penalized(response = y, penalized = X, 
                                                                   lambda1 = 0, lambda2 = lambda*n, positive = FALSE, model= "linear", trace = FALSE)) },
               "lmridge_rbind" = { ridge_rbind <- lmridge_rbind(X, y, lambda = lambda, intercept = TRUE) },
               "lmridge_solve" = { ridge_solve <- lmridge_solve(X, y, lambda = lambda, intercept = TRUE) }, times=10)
# Unit: microseconds
#                  expr      min       lq      mean   median       uq      max neval cld
# ridge::linearRidge     886.952  923.730 1130.4573  965.854 1293.649 1959.504    10   b 
# penalized::penalized  2867.411 3021.794 3357.5860 3221.507 3628.205 4315.870    10   c
# lmridge_rbind          118.888  146.686  203.1353  167.426  218.958  487.952    10   a  
# lmridge_solve           42.339   62.866   84.1628   71.419  112.474  159.087    10   a 
# lmridge_solve is by far the fastest option!!
cbind(ridge,ridge_penalized,ridge_rbind,ridge_solve) # coefficients are identical
#                     ridge ridge_penalized   ridge_rbind   ridge_solve
# (Intercept) 1375.17893744   1375.17893744 1375.17893824 1375.17893824
# x1            -0.03099379     -0.03099379   -0.03099379   -0.03099379
# x2            -0.03035035     -0.03035035   -0.03035035   -0.03035035

Ridge regression via OLS using row augmentation

2 Answers2

Linked