Optimization of Mean Absolute Error with regularization

Question

i have two different weather forecasting systems. Each system returns values between 0 and 30 degrees. In addition i have a grounded truth set containing the real temperature values. Now i want to find the optimal mix of both systems using an optimization (i.e minimize):

$$ \sum_{d=1}^D \big|t_d-(w_{S1}\widehat{t_{d,S1}}+w_{S2}\widehat{t_{d,S2}})\big| \to \min $$

(where $d$ refers to days, $t$ to actual or predicted temperatures and $S1, S2$ to my two forecasting systems), under the constraint that

$$ w_{S1}+w_{S2} = 1. $$

Unfortunately this tends to overfit (even with crossvalidation) and the final predicted values are not as good.

Therefore, i thought about integrating a regularization parameter like for example

$$ \dots + \lambda\big(w_{S1}^2+w_{S2}^2\big) $$

to my equation.

The rationale behind this is that both systems then tend to mix 50/50, and I think that they would perform better on real world data than if one system has a large and the other a smaller weight (i.e the distances between both are larger).

Does this make sense? Is the regularization suitable? Are there other options?

score 2 · Accepted Answer · answered May 11 '17 at 12:09

Regularizing the weights sounds like a good idea. Alternatively, you could take a look at simply averaging your two component forecasts, i.e., $w_{S1}=w_{S2}=0.5$.

You may be interested in a recent paper: "The forecast combination puzzle: A simple theoretical explanation" (Claeskens et al., 2016, IJF). The "puzzle" they refer to is the fact that a straight average of component forecasts surprisingly often outperforms averaging with optimized weights. Their explanation essentially boils down to the bias-variance tradeoff in the optimization of the weights. They don't explicitly mention regularization as a possible remedy, but it is certainly not far-fetched.

(Just a note: you fit your model using MAE - I assume you are also assessing forecast quality using the same KPI, right? If you optimize the MAE but then assess using MSE, your optimization may not target the actual quality measure you are interested in. This is mainly relevant if the future distributions are asymmetric, of course.)

Thank you for the comment. I will check the recommended paper. Yes im benchmarking my system using MAE. The straight average is unfortunately performing badly. In addition i want to determine optimal (and individual) weights for several cities in my example. Do you think that a L2 regularization is suited? And is it a problem that w is not a vector but a single value in my example? I only found examples using vectors.. — J-H, May 11 '17 at 12:18
$\ell_2$ regularization makes sense. $\ell_1$ would probably force one component to be exactly zero. You might want to look at the elastic net, which combines $\ell_1$ and $\ell_2$ penalties. No, it's no problem that your $w$ is a small vector, of only two entries. You just include $||w||_2^2$, which happens to equal $w_{S1}^2+w_{S2}^2$. — Stephan Kolassa, May 11 '17 at 12:30

Optimization of Mean Absolute Error with regularization

1 Answers1