2

We have a number of providers for a forecast of wind power generation per country per date.

Values are forecast up to one week ahead.

Forecasts may be compared with actual values of reported wind power generation for a particular date.

The error decreases as the forecast time distance decreases.

I would like to construct a simple model which will give me a level of belief allocated to each of the providers based on their forecast (and the history of their forecasts).

I am a Bayesian newbie, reading though Kruschke's excellent book, and thought that a linear regression where $x_{i}$ are the forecasts and $y$ is the actual would be appropriate - do you agree? Any tips for formulating this General Linear Model?

The previous forecast performance would give a distribution of error for each provider, which could be used as the prior? Then when new information is received, in this case a new forecast for a specific number of days ahead, we could update the overall probability of the forecast out-turn by considering all the ensemble probabilities. Is this a reasonable approach?

chris
  • 73
  • 6
  • Does each of the providers make single or multiple forecasts (e.g. you have $i$-th providers' forecast for $j$-th turbine $x_{ij}$ with multiple values per turbine and per provider)? – Tim May 17 '16 at 10:56
  • single forecasts. i'm looking at forecast per country at the moment – chris May 17 '16 at 10:59
  • Could you provide data-example for your question? – Tim May 17 '16 at 11:04
  • not really as this data is on licence from providers. i will try and add more detail to the question – chris May 17 '16 at 12:27
  • You want to create one forecast from many forecasts of different providers? – user31264 May 21 '16 at 23:18
  • @user31264 yes that is correct, dynamically adjusting how the forecast is construct as time progresses – chris May 23 '16 at 12:24

1 Answers1

1

If you have several $x_{ij}$ forecasts ($j$-th providers forecast for the $y_i$ value) and want to combine them, then you are thinking about using some kind of weighted mean of them (see here for examples), e.g.

$$ y_i = \sum_j w_j x_{ij} $$

where $w_j$ are weights such that $\sum_j w_j = 1$. Weights $w_j$ in this case can be interpreted as $j$-th providers reliability.

Your model seems to be more complicated than this since you also want to include information about forecasts time distance $t_j$. First you need to notice that you are not interested in actual $y_i$'s but rather in providers errors. For this, let's define variable $z_{ij} = y_i - x_{ij}$, for difference between actual value of $y_i$ and prediction of $j$-th provider. Such difference depends on providers reliability $\alpha_j$ and the forecast delay $t_{ij}$, i.e.

$$ z_{ij} = \alpha_j + \beta t_{ij} + \varepsilon_{ij} $$

$\alpha_j$ is a random variable with mean $\bar \alpha_j$ (additive bias), and variance $\sigma_{\alpha_j}^2$ (how much forecasts by $j$-th provider vary in their error). In this particular case I assumed that the effect of delay $\beta$ is the same for all providers, but you can make different assumptions. Knowing values of those parameters you would be able to predict given the data the expected error for $j$-th provider that made his forecast delayed by $t_{ij}$ time points. You can use such bias estimates for bias correction for individual forecasts: $\hat y_i = x_{ij} + \hat z_{ij}$.

The absolute value of inverted estimated error $|\hat z_{ij}^{-1}|$ could instead of bias correction be used to weight different forecasts and make weighted forecast

$$ \hat y_i = \frac{\sum_j |\hat z_{ij}^{-1}| x_{ij}}{ \sum_j{|\hat z_{ij}^{-1}|}} $$

Since each combined forecast is weighted by expected, inverted error (the more biased it is, the less weight it has), you could expect such forecast to be (on average) better than individual forecasts.

In Bayesian setting you can set up some priors on $\alpha_j$ and $\beta$ to estimate them and then, in sequential analysis, when new data arrives use their posterior estimates as priors for new estimation (check Bayesian updating).

Tim
  • 108,699
  • 20
  • 212
  • 390