2

I'm a software developer working on a system that stores thousands of independent metrics, each with several tens of thousands of timeseries data points. We'd like to make predictions about where the metrics are headed.

For each metric, we want to do two things:

  1. Make a prediction for as many of the observations as we can, as if we had seen only the observations up to that point.

That is, if we have a sequence of timeseries observations $O = [(t_0, v_0), (t_1, v_1), (t_2, v_2), \cdots]$, then we want to produce a new prediction series $P = [(t_0, p_0), (t_1, p_1), (t_2, p_2), \cdots]$, with the same time indices. For a given time $t$, the series can only use observations that came before $t$ to make the prediction at $t$.

The prediction is permitted to require some minimum number of observations before making any predictions -- that is, the prediction series can start at an index other than $t_0$ (but $P$ can only include indices present in $O$).

The time indices $t_0, t_1, \cdots$ are not guaranteed to be equally spaced.

  1. In the future, when new observations for that metric come in, we don't want to have to recompute all the previous predictions to make one more prediction.

That is, we might first receive 500 observations, make ~500 predictions, receive 3 more observations, make 3 more predictions, and so on. But the cost of making those additional 3 predictions should not include recomputing the first 500 -- the cost to compute one additional prediction should, ideally, be constant (or hopefully bounded).

Finally, because I'll be building this myself, predictive accuracy is less important than implementation complexity. I'd be writing this in Python or Ruby, which have passable statistical libraries, but they don't compare to R or MATLAB and one can't generally assume that things are done for you.

Are there well-known statistical predictive models with reference implementations that satisfy these conditions?

2 Answers2

1

You are asking about the class of algorithms or models known as (recursive|online|incremental|streaming) (estimation|regression|forecasting|learning) algorithms. Generally, I see the terms "recursive regression", "incremental learning", "streaming (machine) learning" or "online learning" used most frequently. Online learning may not be $O(1)$, as it intersects with a wider set of topics, but the first three should be very relevant and have $O(1)$ model parameter update and prediction computation times. There is a whole literature devoted to these topics, which is mostly the same topic, but split among engineering, computer science, and statistics researchers (who use different terms for the it).

The simplest you can get with this is recursive least squares, which is recursive ordinary least squares regression. It's $O(1)$ for model parameter updates and $O(1)$ for predictions. Python has the basic reference implementation in statsmodels, a very good statistics library that is widely used in Python.

After that, stepping up the complexity and power of the model a bit, you have Kalman filters, which are implemented in several Python libraries. The recursive, $O(1)$ implementations might be a little hard to find though, so here's a good description of the recursive Kalman filter and here's an implementation of the recursive Kalman filter.

Finally, stepping up a bit more, you can enter the domain of nonlinear recursive regression algorithms. I'll not say too much on this, as the topic gets very complicated somewhat quickly. But if you have the time and background necessary to delve into this, here's a classic paper that adds the nonlinearity by kernelizing recursive least squares: Yaakov Engel, Shie Mannor, and Ron Meir. The Kernel Recursive Least Squares Algorithm. 2003

Finally, there are recursive versions of many of your favorite traditional time series forecasting algorithms. By that, I mean recursive autoregression: recursive ARMA, recursive ARIMA, recursive ARIMAX, etc. Some implementations in Python (e.g. in statsmodels' time series analysis section) might already be written recursively, but I'm not sure. There is a connection between ARIMA models and Kalman filters on differenced time series data which escapes me at the moment, but essentially, I'm not sure if you'll get that much more performance if you move to these instead of sticking with Kalman filters (and the more complicated extensions of them).

Finally, any algorithm has bounded computation time if you terminate the computation of the new model parameters early. If you're computing them by optimizing them via some iterative optimization algorithm when batches of new data come in, then you can terminate the optimization early, and bound your computation time. This is generally the kind of algorithm you find in the "online learning" field, and indeed, any machine learning algorithm can be trained like this (basically, you just retrain every time $k$ new data points have arrived, and terminate after training for $m$ minutes). This is, of course, not exactly constant time per unit of performance, and if you terminate too early, you may get a terrible set of parameters. But the increased space of models and the greater model complexity moving to the space of iteratively optimized algorithms buys you can make up for this, sometimes (for example, deep learning sits in this space--though obviously at the far end of it). Moreover, if you can get the implementation to start the optimization from the previous model's already-trained parameters, it will usually converge much, much more quickly (I like to use this "trick", though it's a very simple observation). A library that implements this kind of thing is creme-ml, which seems to be merging with scikit-multiflow at the time of this writing. Both claim to do machine learning on streaming data, which I would assume means you don't have to keep the old data (or the old predictions) around. I've never used either though, so YMMV.

TheIntern
  • 133
  • 7
0

I have developed a commercially available (not in R) computational procedure to take as input a user suggested model and add new data to the old data . The user has the option to use "AS IS" the input model OR re-estimate model parameters OR augment/change the input model based upon the empirically identified impact of the new data. You should be able to do the same ... The predictive model that I use is called a transfer function or an XARMAX model or a Dynamic Regression .

IrishStat
  • 27,906
  • 5
  • 29
  • 55