1

I'm looking to create a predictive model for time series data using historical data only (no other variables) and simple curve fits (linear, polynomial, exponential etc).

The issue is that I'm trying to use a single algorithm on thousands of heterogeneous data sets and each data set exhibits various "regimes" across time, so I'm looking for something that is adaptive.

For instance, if I have 10 years worth of data, maybe year 1 is best fit by a 4th degree polynomial, years 2 to 4 best fit linearly, and years 4 to 10 by an exponential. Because I'm doing this over many data sets, I need the algorithm to choose the ideal fits on its own.

Is there something more or less pre-built there that already fulfills some of these needs? If not, where would you suggest I start to create something quickly? I have a good amount of Python dev experience and data manipulation/vis (pandas, matplotlib) and some sklearn experience.

  • 1
    you need something like auto.arima in R, see this thread on SO https://stackoverflow.com/questions/22770352/auto-arima-equivalent-for-python – Aksakal Oct 31 '17 at 18:17
  • 1
    Even better , you need a solution that will integrate memory use (history) while accomodating identifiable deterministic structure (pulses, level shifts, seasonal pulses,time trends) in the presence of non-homogeneous parameters/error variances over time without resorting to a list-based solution. Polynomials are a definite no-no when dealing with time series and should be studiously avoided. For more see https://stats.stackexchange.com/questions/225931/why-is-my-high-degree-polynomial-regression-model-suddenly-unfit-for-the-data/289499#289499 – IrishStat Oct 31 '17 at 18:38
  • +1 @IrishStat But I wonder if bamboo77's "no other variables" includes seasonal identifiers... that is, is she (or he) simply looking for a description of each time series against the clock, but not against theories of the clock? – Alexis Oct 31 '17 at 18:47
  • seasonal identifiers i.e.seasonal pulses can be found from the data THUS are not user-specified and thus meet the requirement of "no other user-specified variables" . The OP is explicit in not wanting to have to specify other variables BUT analytics extracting time trends,level shifts,etc. are not prohibited thus are fair game. When he said no other variables he meant he did not want to explain variation with user-specified causal series . – IrishStat Oct 31 '17 at 19:49
  • Thanks for the ideas guys. I probably could've been clearer but @IrishStat is correct in his interpretation –  Nov 02 '17 at 14:58

0 Answers0