The following is a question pertaining to a large scale ridge regression. I am stumped by this question, any one have an idea? Thanks
Suppose the data for the ridge regression problem becomes available sequentially, i.e. the kth data point xk arrives at time tk. At time tk we want to be able to compute the optimal ridge estimate beta k using all the previous data x1 through xk. In the usual method for ridge regression, we will have to store all the previous data points in a database, create the data matrix, and compute the estimate. Thus, the memory requirements will grow over time. Explain how you could still compute beta k for all k >= 1, while keeping only O(d2) numbers in the database.