Let's assume we fit a linear regression to our data $(X_i,Y_i)_{old}$, we lose the old data points but we still have access to the coefficients and the statistics of our linear regression. New data $(X_i,Y_i)_{new}$ comes in. We want to fit a linear regression model again. What is the best method to incorporate the statistics from fit for $(X_i,Y_i)_{old}$ into our new regression?
Asked
Active
Viewed 218 times
1
-
Are you open to using a bayesian method with the priors specified by the old coefficients and standard errors? – David Luke Thiessen Jun 30 '21 at 15:09
-
2If you have the statistics from the old regression for $n, \sum x_i, \sum x_i^2, \sum y_i, \sum y_i^2, \sum x_i y_i$ (or their equivalents of number, means, variances and covariance) then you can incorporate them into the new regression – Henry Jun 30 '21 at 15:10
-
@DavidLukeThiessen If it works why not. – user127776 Jun 30 '21 at 15:11
-
I second the bayesian method here, or you could use the old coefficients as starting points for the optimization but that would probably not be what you are looking for. – Tylerr Jun 30 '21 at 15:11
-
@Henry What is the logic behind those choices? – user127776 Jun 30 '21 at 15:11
-
1The simple linear regression coefficients are related to the means, variances and covariance and these statistics allow you to calculate these for the combined regression – Henry Jun 30 '21 at 15:14
-
2Specific solutions along the lines indicated by @Henry appear here on CV: see https://stats.stackexchange.com/questions/6920 for instance. Also look for threads about updating means and covariances (or moments generally) with new data -- https://stats.stackexchange.com/questions/51622 will do -- and apply them to regression, using threads that describe how to do regression using only those moments -- such as https://stats.stackexchange.com/questions/107597. – whuber Jun 30 '21 at 15:53