MAPE is better but MAE is worse in regression models

Question

I am working on a regression problem to predict price of the vehicle based on its features. I have been experimenting with several trials but in one of them, MAPE (Mean Absolute Percentage Error) is better but MAE (Mean Absolute Error) is worse:

Model 1:
MAE: 1857
MAPE: 0.46

Model 2:
MAE: 2160
MAPE: 0.40

Model 1 does not have any temporal features whereas in model 2, I have engineered some features based on time components.

From my understanding, MAE and MAPE metrics are directly proportional but not sure why its happening in my case.

Tim · Accepted Answer · 2020-09-28T19:26:58.097

Those metrics are not "directly proportional". MAE is defined as $\frac{1}{n} \sum_{i=1}^n \left|y_i - \hat{y_i}\right|$, while MAPE as $\frac{1}{n} \sum_{i=1}^n \left|\frac{y_i - \hat{y_i}}{y_i}\right|\times 100$. The difference is that for MAPE each of the differences is taken relative to the predicted value $y_i$. So for MAE each of the differences have same "weight" on the final outcome, while for MAPE they have different weights, depending on their magnitudes (small difference for large value means less, than large difference for small value etc). So those metrics can diverge, as in your example.

To give simple numerical example (in Julia), imagine that you are prediction only two values, small one and a big one. We'll be comparing two predictions, in first case the small value will be off, while in another case the big value will be off, in each case the difference will be the same. MAE will be the same in both cases, while MAPE will significantly differ.

mae(y, yhat) = sum(abs.(y .- yhat))
mape(y, yhat) = sum(abs.((y .- yhat) ./ y))
y = [1, 100]

yhat = [1, 105]
mae(y, yhat), mape(y, yhat)
## (5, 0.05)

yhat = [6, 100]
mae(y, yhat), mape(y, yhat)
## (5, 5.0)

You may be interested in reading the What are the shortcomings of the Mean Absolute Percentage Error (MAPE)? thread as well.

Thank you for the great explanation. – user3447653 Sep 28 '20 at 19:26 — user3447653, Sep 28 '20 at 19:26

score 1 · Answer 2 · answered Sep 28 '20 at 19:38

Another example:

Observed Values $ (O_i) = \{5, 7, 10\} $

Fitted Values of Model 1 $ (F_{1,i}) = \{6, 6, 6\} $

Fitted Values of Model 2 $ (F_{2,i}) = \{7.5, 7.5, 7.5\} $

$ MAE = \frac{1}{n} \sum_{i=1}^{n} \mid F_i - O_i \mid$

$ MAPE = \frac{1}{n} \sum_{i=1}^{n} \mid \frac{F_i - O_i}{O_i} \mid$

$ MAE_1 = 2.0 $

$ MAPE_1 \approx 24.8\% $

$ MAE_2 \approx 1.83 $

$ MAPE_2 \approx 27.4\% $

$ MAE_1 > MAE_2 $ but $ MAPE_1 < MAPE_2 $

MAPE is better but MAE is worse in regression models

2 Answers2