2

Assuming online/incremental training is not available for a particular algorithm, and assuming that you have a stream of training data that may or may not change over time (eg log data), what are the disadvantages to the following approach to defend against concept drift?

  1. Decide in a time window (eg one week) and collect all the training data for that time window
  2. Train a model on data collected in step 1.
  3. Store this model in an array.
  4. When the time window elapses, train a new model on the new time window data.
  5. Append this model to your "model array"
  6. Repeat as necessary or as constrained by resources. Older models can be deleted

For inference, gather the predictions from each model in the array and average the results - possibly adding weighting to favour more recently trained models

This seems to intuitively make sense however I haven't found much research supporrting this approach.

What would be the disadvantages of this approach?

deemel
  • 2,402
  • 4
  • 20
  • 37
dvas0004
  • 21
  • 3
  • 1
    Using ensemble methods for stream mining is a common thing that you should find a fair amount of research on. I suggest starting with [this survey artice](http://scholarscompass.vcu.edu/cgi/viewcontent.cgi?article=1038&context=cmsc_pubs) by Bartosz Krawczyk and maybe having a look at his other work. – deemel Nov 12 '19 at 15:59
  • Thanls @deemel! This is a fantastic paper and section 4.1.3 is exactly what I had in mind. Evidently my Google-Fu skills need some work. – dvas0004 Nov 14 '19 at 11:16
  • What would be appropriate to answer this question? Should I summarize what I read and self-accept the answer? Or should it be left to someone else? Either way is fine by me – dvas0004 Nov 14 '19 at 11:17
  • If you believe you can answer your question sufficiently (i.e. what would be the disadvantages of said strategy?) feel free to post an answer. However I'd suggest not to self-accept it, as over time someone might still be inclined to give a (possible more elaborate) answer. – deemel Nov 15 '19 at 08:13

0 Answers0