1

I am trying to make predictions with daily data with time series in R.

This time series is univariate and contains only data from sales from each 365 days in a four year period.

My intention is to make prediction with machine learning models from Caret.

At the beginning I have problems finding proper regressors and to put them into models(regression,random forest etc.)

So can anybody help me to resolve this problem?

Jeremy Miles
  • 13,917
  • 6
  • 30
  • 64
j235
  • 21
  • 1
  • build time series model first, before doing anything with ML – Aksakal Feb 17 '20 at 20:38
  • I already have models with time series models.But modeling with ML is diffrent – j235 Feb 17 '20 at 20:46
  • What’s different? Do you already have regressirs? – Aksakal Feb 17 '20 at 20:47
  • Yes.For example I use all models from forecat package for univarite series like hw,arima,tbats etc.But here modeling is diffrent don't need to have regressor etc – j235 Feb 17 '20 at 20:49
  • 2
    you'll get better traction asking specific questions. asking how to build time series forecasting with caret is too broad. also, if you think ML doesn't require regressors - you're very wrong – Aksakal Feb 17 '20 at 21:22

1 Answers1

0

While this is not a direct answer to your question, here are the reasons why NOT to use ML methods for prediction of time-series data.

(1) In order to prepare a time series for an ML-based method the time series (size N) has to be packed repeatedly in blocks (size n) with one observation used as a response. This produces about n times larger dataset, looses n leading points as responses, and doesn't provide any new information.

(2) ML-methods are almost invariably based on some use of cross-validation that cares not for the time ordering, which means that future datapoints would go in training partitions and past data points in the testing partitions.

(3) Incorporating covariates only exacerbates the problem of enlarging the data and losing the information at the same time.

(4) Using ML tree-based methods for regression would impair the predictions since these methods can't predict beyond observed data ranges.

(5) Uncertainty quantification difficult

Thus, I'd stay away from ML when working with time series data. This does not apply to structured serial data, for example in NPL or the like. However, the methods and goals there are different from (classical) time series problems and there are well established application focused solutions not based on generic or vanilla methods as found in ML libraries (Python or R alike).

Some aspects of this issue, of ML use for TS prediction, have already been addressed here: Times series analysis vs. machine learning?

dnqxt
  • 571
  • 2
  • 8