1

Let's assume we fit a linear regression to our data $(X_i,Y_i)_{old}$, we lose the old data points but we still have access to the coefficients and the statistics of our linear regression. New data $(X_i,Y_i)_{new}$ comes in. We want to fit a linear regression model again. What is the best method to incorporate the statistics from fit for $(X_i,Y_i)_{old}$ into our new regression?

user127776
  • 187
  • 4
  • Are you open to using a bayesian method with the priors specified by the old coefficients and standard errors? – David Luke Thiessen Jun 30 '21 at 15:09
  • 2
    If you have the statistics from the old regression for $n, \sum x_i, \sum x_i^2, \sum y_i, \sum y_i^2, \sum x_i y_i$ (or their equivalents of number, means, variances and covariance) then you can incorporate them into the new regression – Henry Jun 30 '21 at 15:10
  • @DavidLukeThiessen If it works why not. – user127776 Jun 30 '21 at 15:11
  • I second the bayesian method here, or you could use the old coefficients as starting points for the optimization but that would probably not be what you are looking for. – Tylerr Jun 30 '21 at 15:11
  • @Henry What is the logic behind those choices? – user127776 Jun 30 '21 at 15:11
  • 1
    The simple linear regression coefficients are related to the means, variances and covariance and these statistics allow you to calculate these for the combined regression – Henry Jun 30 '21 at 15:14
  • 2
    Specific solutions along the lines indicated by @Henry appear here on CV: see https://stats.stackexchange.com/questions/6920 for instance. Also look for threads about updating means and covariances (or moments generally) with new data -- https://stats.stackexchange.com/questions/51622 will do -- and apply them to regression, using threads that describe how to do regression using only those moments -- such as https://stats.stackexchange.com/questions/107597. – whuber Jun 30 '21 at 15:53

0 Answers0