1

I have a single output $y$, and multiple inputs $x_1, x_2,\dots,x_n$. I am running online(streaming) regression, which would be complicated with many inputs. So, to go around it, I want to have $n$ separate regressions: $y$ on $x_1$(simple OLS), then separately $y$ on $x_2, \dots$ then separately $y$ on $x_n$, and then I would weigh the outputs of these $n$ OLS regressions into one single output, by weighing them appropriately.

Is there a good way to go about this? I was thinking to weigh each regression by the absolute value of the correlation between $y$ and $x_i$, is this a good idea? Can anybody suggest alternatives, or relevant resources?

Sven Hohenstein
  • 6,285
  • 25
  • 30
  • 39
The Baron
  • 611
  • 1
  • 6
  • 16
  • One possible thing to consider is updating a Choleski decomposition of $X'X$ or a QR decomposition of $X$ as you observe a new vector of predictors $x_t$. This will be faster than performing an entire regression at each time step (I believe the regression can be updated in $O(p^2)$ for both cases). A second possibility might be to identify some principal components of the $x$'s from a sample and then update a [principal component regression](https://en.wikipedia.org/wiki/Principal_component_regression) (which may be much faster than doing the whole regression, and can be updated as above). – Glen_b Aug 25 '15 at 10:31
  • 1
    The question as asked isn't terribly constructive, because it posits an invalid approach to multiple regression (as Peter Flom points out in his answer). The actual problem you appear to face, of performing online regression, has been asked at http://stats.stackexchange.com/questions/6920, where you will find answers. – whuber Aug 25 '15 at 12:54

1 Answers1

2

No, there is no good way to do this. Because a regression $Y \sim x_1 + x_2 $ is not a linear combination of $Y \sim x_1$ and $y \sim x_2$ . The first method has the effect of each variable controlling for the other.

Peter Flom
  • 94,055
  • 35
  • 143
  • 276