Let say that one wants to fit a model to a daily financial time series for prediction (e.g. ARIMA, SVM). If data are stationary, ideally the longer the time series, the better. In practice, I don't feel comfortable in blindly trusting stationarity tests (e.g. KPSS, ADF). For example, a 90% KPSS and ADF confirm that the following time series is stationary when it qualitatively doesn't seem to be homoscedastic.
Which quantitative methods exist to identify a reasonable starting date of the time series in terms of quality of the prediction (i.e. minimum test error, low variance of the prediction)? Please refer to R packages when possible.
My attempts:
(i) A brute force approach could consist in repeating the fitting for any length of the time series of interest (e.g. 1y, 1y+1d, ..., 5y). Anyway, this approach is too expensive.
(ii) Perform stationarity tests (ADF, KPSS) to the time series of minimum allowed length and extend the length until the tests reject the stationarity. The problem of this approach are multiple: (a) extremely dependent to the confidence of the test (e.g. 95% or 80%). (b) stationarity tests are not able to identify change of regime that may occurs for long financial time series.
Strictly related topic, but it doesn't provides automatic/quantitative procedures: Length of Time-Series for Forecasting Modeling
EDIT (2/Jul/2016): After further thoughts, I think that an optimal approach could be to follow the principle "the larger the dataset, the better". After all, a model that is highly dependent on the length of the time series I guess that it could be considered a "bad" model. Rather than focusing on the selection of an optimal length, one could focus on the identification of features that are able to work well under different regimes of the time series.