0

Say I have two datasets that have the same features but different samples. If I build two linear models, one for each of the datasets, and then take a weighted average (say the weights here are the number of samples in each of the datasets that the model was built on) of the weights of each of these linear models, what can I say about the resulting meta-model?

Model 1, built over the first $m$ samples $$ Y_0 = \sum_{i=0}^{k}(\beta_{0,i}*x_{i}) $$ Model 2, built over the remaining samples ($n$ in this case) $$ Y_1 = \sum_{i=0}^{k}(\beta_{1,i}*x_{i}) $$ The merged model that I am interested in $$ Y_{1+0}= \sum_{i=0}^{k}(\frac{m*\beta_{0,i}+n*\beta_{1,i}}{m+n}*x_{i}) $$

I have looked into ensembling, and as far as I can tell it is common practice to use multiple models for prediction (average the result) or classification (majority vote), but I have yet to find someone discussing what the result of a merged model would be.

Nate Parke
  • 323
  • 1
  • 3
  • 7
  • 1
    As a minor correction, I think you need to redo your final equation to fix the denominator, e.g. divide by $m+n$ instead of $2$, to get the weighted average. – jbowman Nov 29 '17 at 19:02
  • 2
    Is your question about *this particular process of weighting* the models (there are better ones that have theoretical justification, btw) or is it really concerned about *whether and how* to combine regression results? – whuber Nov 29 '17 at 19:02
  • @whuber it is a bit of both I suppose. I would be interested in the better model merging procedures, but I would also like to understand what the follies of this procedure are. – Nate Parke Nov 29 '17 at 19:20
  • 1
    Note that for linear models averaging the coefficients in this way is equivalent to averaging the predictions – Jonny Lomond Nov 29 '17 at 20:24
  • 1
    The right way to average weights the predictions by the inverses of their variances. Those usually will not be the same as the data-count weights. One way to see why not is to consider the case of simple regression where the ranges of the $x_i$ don't even overlap between the two fits. It should be clear that the greatest weight should be given to the coefficient from the applicable fit. The result won't even be a linear function of $x$! – whuber Nov 29 '17 at 20:30

1 Answers1

0

Its not clear why you'd want to train two separate models on two samples coming from the same population. If your goal is to check for consistency of results from both models, you might want to carefully choose an appropriate sampling method so that m and n are not too heterogeneous. If they are heterogenous, in other words the underlying data distribution is different; that would result in different parameter estimates and combining the models won't be such a good idea.

Also the above method would be tricky to implement when you have longitudinal data or time series data.

Shuvajit
  • 31
  • 3
  • My use case would be very large datasets where it is expensive to move the data from one data center to another, where as sending the weights of a model built over each of the respective datasets would be cheap. I am assuming the underlying distributions are homogeneous. Why do you say it would be difficult with longitudinal and time series data? – Nate Parke Nov 29 '17 at 19:42
  • How many variables appear in the model? Unless it's huge, it ought to be practicable to communicate the SSP matrices, which is all you need to update the estimates. With $p$ variables these have $(p+1)(p+2)/2$ coefficients. – whuber Nov 29 '17 at 23:23
  • 1
    Sampling of Time Series data is tricky because you can't just partition the data on time dimension. For example, if there are inherent seasonal patterns in the data, sampling may lead to truncated time-series with or without those effects. – Shuvajit Nov 30 '17 at 18:12
  • @whuber, do you have any literature suggestions describing the process for computing that update? – Nate Parke Dec 01 '17 at 19:34
  • 1
    I don't, Nate, but the formulas have appeared here in several places. Naturally, it's easiest for me to remember and find one of my own contributions at https://stats.stackexchange.com/questions/51622/combining-two-covariance-matrices/51927#51927, but I know there are some other posts on the subject, too. Here's one search for them: https://stats.stackexchange.com/search?q=covariance+combine. – whuber Dec 01 '17 at 20:45