0

Possible Duplicate:
Efficient online linear regression

Is there a summation representation for multivariate regressions?

For example, if I regress $y$ on $X$ instead of using $\hat \beta = (X'X)^{-1} X'y$, if $X$ is univariate, I can use $\hat \beta = \frac{ \overline{xy} - \bar{x}\bar{y} }{ \overline{x^2} - \bar{x}^2 }$ and $\hat \alpha = \bar y - \hat \beta \bar x$ (then use $\hat \beta$ and $\hat \alpha$ to find residuals). This is a huge time saver in rolling regressions.

Is there a similar summation representation for multivariate regressions or other time-saving techniques? I am using built in regression routines and matrix multiplication and find that these techniques can be very time-consuming when I do rolling regressions on a large panel. Thanks!


Update: It would have been clearer if I hadn't used shorthand for $\hat \beta$. In the calculation I actually use something like $$\hat \beta = \frac{ \sum_{i=1}^{n}{x_{i}y_{i}} - \frac1n \sum_{i=1}^{n}{x_{i}}\sum_{j=1}^{n}{y_{j}}}{ \sum_{i=1}^{n}({x_{i}^2}) - \frac1n (\sum_{i=1}^{n}{x_{i}})^2 },$$ although the summation over each window is done by differencing the cumulative sum.

Richard Herron
  • 1,161
  • 2
  • 13
  • 20
  • 1
    If by "rolling regression" you mean regressions within moving windows, then your question has been asked and answered at http://stats.stackexchange.com/questions/6920/efficient-online-linear-regression . If this does not provide a full answer, please clarify what you need. – whuber Oct 12 '11 at 16:24
  • @whuber -- Thanks for the link, but I think I am looking for something more naive. I don't need to update my coefficients, but rather I would like to determine coefficients using the same-sized window that moves over time (e.g., find beta every month using 1-year's data). The summation notation is great because if I find the cumulative sum, I can very quickly find the sum over the estimation window using a difference operator. I am interested if there's a summation representation for multivariate regressions so that I can use this time-saving technique with multivariate models. – Richard Herron Oct 12 '11 at 16:36
  • 1
    That other thread sounds *exactly* what you're looking for. It describes an algorithm in which you find the new betas in terms of the old hat matrix, the new case that just entered the window, and the old case that just left it. That will be a *lot* faster than recomputing the SSP matrix $X'X$ and inverting it again. As far as "summation representation" goes, it's unclear what you mean, because even your univariate example has no explicit summation in it. – whuber Oct 12 '11 at 17:11
  • @whuber -- Oh, now I see. Sorry. I missed that I can use the technique to add _and_ remove rows. I will have to code that and see if it yields speed gains over conventional OLS (the summation form provides speed gains of about 50x in my trials). Unless you think there's a generalized summation form, I guess my question should be closed. – Richard Herron Oct 12 '11 at 19:18
  • 1
    The generalization exists but is complicated and algorithmically awful: you have to replace the quotient by an explicit matrix inverse. Feel free to re-open this question if the other other solution doesn't work out. – whuber Oct 12 '11 at 20:42

0 Answers0