2

The following is a question pertaining to a large scale ridge regression. I am stumped by this question, any one have an idea? Thanks

Suppose the data for the ridge regression problem becomes available sequentially, i.e. the kth data point xk arrives at time tk. At time tk we want to be able to compute the optimal ridge estimate beta k using all the previous data x1 through xk. In the usual method for ridge regression, we will have to store all the previous data points in a database, create the data matrix, and compute the estimate. Thus, the memory requirements will grow over time. Explain how you could still compute beta k for all k >= 1, while keeping only O(d2) numbers in the database.

hhkk
  • 23
  • 2
  • 1
    i suspect it is a homework problem... but still +1 as I want to know the answer. stochastic gradient descent? – Haitao Du Mar 05 '18 at 20:34
  • 2
    A combination of two answers on this site immediately gives a good solution: [ridge regression can be implemented as OLS](https://stats.stackexchange.com/a/69209/919) and [OLS can be performed online](https://stats.stackexchange.com/questions/6920/efficient-online-linear-regression). cc @hxd1011 – whuber Mar 05 '18 at 20:40

0 Answers0