i was wondering what is the differences between Mean Squared Error (MSE) and Mean Absolute Percentage Error (MAPE) in determining the accuracy of a forecast? Which one is better? Thanks
-
2One is not a percentage, the other is a percentage... it depends on how you wish to evaluate your forecasts. – Zach Jun 06 '11 at 16:30
-
1actually Zach made a brief answer, I prefer Theil decomposition and MAPE for the accuracy of point forecasts. Per cents are much easier to interpret. You may use `accuracy()` function in R for these options. Also consider MASE as a nice alternative... probably extend my short remark to a full answer latter :) – Dmitrij Celov Jun 06 '11 at 18:32
-
Related: [What are the shortcomings of the Mean Absolute Percentage Error (MAPE)?](https://stats.stackexchange.com/q/299712/1352) – Stephan Kolassa Jan 06 '22 at 09:31
2 Answers
MSE is scale-dependent, MAPE is not. So if you are comparing accuracy across time series with different scales, you can't use MSE.
For business use, MAPE is often preferred because apparently managers understand percentages better than squared errors.
MAPE can't be used when percentages make no sense. For example, the Fahrenheit and Celsius temperature scales have relatively arbitrary zero points, and it makes no sense to talk about percentages. MAPE also cannot be used when the time series can take zero values.
MASE is intended to be both independent of scale and usable on all scales.
As @Dmitrij said, the accuracy()
function in the forecast
package for R is an easy way to compute these and other accuracy measures.
There is a lot more about forecast accuracy measures in my 2006 IJF paper with Anne Koehler.

- 51,928
- 23
- 126
- 178
-
1
-
I have just now edited the tag wikis for the [tag:mse], the [tag:mape] and the [tag:mase]. Hope these are approved soon - then I hope they provide some more information on the differences. – Stephan Kolassa Apr 15 '16 at 09:17
in comparing forecast values and measuring best fit model, from different methods we can use MSE, MAPE and RMSE. which method has least one is better model.