0

I have a time series forecast along with actual historical data, and its accuracy (MAPE, probability coverage etc.) is calculated. Now I want to estimate how improving some or all of the accuracy measures would affect a business KPI (which can be directly calculated from the forecast). I thought that I could simulate multiple forecasts with fixed accuracy (e.g. MAPE = current_MAPE - 1%) and get an empirical distribution of the KPI. What would be a proper approach to generate such forecasts?

1 Answers1

1

I'm not sure what you mean by "fixed accuracy" in this context. A fixed accuracy would imply that since the accuracy is always the same (and known after the first few forecasting) we can adjust for the accuracy rate to recover the original value of the time series and we end up with a perfect forecast instead.

I assume what you really want is to simulate forecasts for which the accuracy (and hence by implication, the forecast error) is drawn from a known distribution.

In this case you have two approaches:

  • (1) Assume your forecast error $\epsilon_t$ follows a known distribution $P$ (Normal, Gamma, etc...). Then do the following:

    • Generate a forecast at time $t$, $\hat{Y}_t$.
    • Draw a random value of $\epsilon_t$ from $P$.
    • Add the error term $\epsilon_t$ to your forecast value $\hat{Y}_t$ to get a new sample of your future time series at time $t$
    • Use that sample $\hat{Y}_t + \epsilon_t$ to feed it to your model and generate a forecast for the next step $t+1$, $\hat{Y}_{t+1}$.
    • repeat this process: drawing a random sample from $P$ and then generating the next step forecast until you have reached $\hat{Y}_{t+T}$, the number of steps forward you want to forecast.
    • Repeat the above process enough times (50, 100,...), each time using different values of $\epsilon$, to create enough samples from your future time series to get a good estimate of how your forecast distribution effects your KPIs and decisions.
    • Alternatively, instead of assuming a distribution $P$ you can use the historical forecast errors (if you have already been running your model for quite a while, or through maybe through backcasting if you have enough history).
  • Or (2), perform a full density forecast: This approach is more rigorous than the above mentioned approach, but is also more complicated to implement. The idea is that instead of estimating a model of your actual values $Y_t$, you estimate a model of the full distribution $P(Y_t)$ from your historical data. This can be either parametric or non-parametric. Once you have an estimate of the distribution, you can generate the sample paths directly by sampling from the distribution, as opposed to sampling from the error and then adding it back to get your sample paths. Depending on the complexity of the distribution, you can either sample from it directly, or you might have to use some MCMC method to sample from it.

Skander H.
  • 10,602
  • 2
  • 33
  • 81
  • Thanks for such a thorough reply! By fixed error I meant that I'd like to generate some random forecasts (not based on a specific model, just a sequence of random values) with a constraint that average error relative to actual data would be the same, e.g. 10%. In other words, when drawing N variables from known distributions, how to ensure that these N variables satisfy some additional constraint (total error = const) – Dmitry Shopin Jan 10 '20 at 17:39
  • @DmitryShopin you seem to be mixing two concepts. First you say "average error relative to actual data would be the same" then you say "satisfy some additional constraint (total error = const)". These are not equivalent statements: The average error can be 10% and you can still sample 9%, 8%, 11%, then 9% again, etc...but when you say errors must sum up to 10%, then once you sample 9%, you will no longer be able to sample it again, since any future values will have to be 1% or lower (e.g. your doing some sort of continuous equivalent to sampling without replacement). – Skander H. Jan 10 '20 at 19:57
  • The first case is pretty straightforward and is already covered by the approaches I mentioned. The second case will be much, much more complicated to implement, but more importantly, It's somewhat "unnatural", unless you expect there to be some sort of of correlations/causal relation between your errors. As much as possible, we try to have the errors be independent. If you have some unusual case where they are not independent, then you could maybe look at volatility models and GARCH models (but I'm not an expert in those topics). – Skander H. Jan 10 '20 at 21:11
  • @ Skander H. - Reinstate Monica Probably I should have formulated more clearly... By "average error" I meant the same as by "total error", which is not right. So, I'll rephrase: "ABS(forecast_i - actual_i) can be any (drawn from some distribution), but SUM[ABS(forecast_i - actual_i)] / N must be const". Makes sense? – Dmitry Shopin Jan 11 '20 at 00:10
  • @DmitryShopin just to confirm : you mean the second case that I mentioned "satisfy some additional constraint (total error = const)", have I understood correctly? – Skander H. Jan 11 '20 at 00:23
  • @ Skander H. - Reinstate Monica. right it is about the second case. Would it be conceptually wrong to draw errors for individual time points from, e. g., normal distribution(s) and then keep only those sets of such errors that give the desired average error? – Dmitry Shopin Jan 11 '20 at 00:39
  • @DmitryShopin It would be wrong, because you are introducing selection bias into your estimates. If that condition is necessary (now I'm really curious, what is your use case?) then you are dealing with correlated errors. I don't have much experience with correlated errors, but I can tell you two things: (1) If the errors are correlated, that means that there is still deterministic information in your time series that your model isn't capturing - you're better off trying to improve the model up to a point where you have only independent errors remaining - – Skander H. Jan 11 '20 at 00:57
  • In fact some people apply exactly that approach: They use a straightforward model like Holt-Winters to generate a baseline forecast, and then apply something more exotic like a neural network or SVM to try to capture any additional information that remains in the residuals. – Skander H. Jan 11 '20 at 00:59
  • (2) Various volatility models like GARCH try to capture the variance (strictly speaking not the error, but conceptually close enough) - if maintaining that dependency between the errors is a hard requirement for your use case, then you might look at those models - but I don't have enough experience to help you envy further with approach. – Skander H. Jan 11 '20 at 01:01
  • I'm trying to answer a "simple" question from business - "how much money will we save reducing the error of our forecast by 1%?" – Dmitry Shopin Jan 11 '20 at 01:03
  • 1
    @DmitryShopin interesting. I'm pretty sure that you can get by with an average forecast error assumption (i.e. the first scenario) - instead of the hard constraint you are trying to impose on your error distribution. Individual error values are stochastic, but you should be able to compare the impact of an error distribution with an average error of 10% vs the impact of an error distribution with an average error of 9% without imposing any additional conditions. – Skander H. Jan 11 '20 at 01:15
  • 1
    @DmitryShopin Note however that you are making the assumption that the cost of over-forecasting and the cost of under-forecasting are the same, but that is rarely the case in real world business scenarios in my experience. Usually they are very different, and you should way your forecast errors differently (maybe you can do that with quantile forecasts?) or better still, look at analyzing and optimizing the decision process that consumes the forecast, and then work backwards from there. – Skander H. Jan 11 '20 at 01:17
  • 1
    Here is an example from my world of inventory and demand forecasting: https://www.lokad.com/accuracy-gains-(inventory) – Skander H. Jan 11 '20 at 01:19