Choosing metric for regression problem RMSE or MAE

Question

I have a regression task where I want to predict no of product sold by manufacturer. My data is time series with each day having a certain no of units sold.

I'm slightly torn as to which metric to use. In the past I must confess I've used RMSE without thinking.

RMSE - since this penalizes predictions far from truth isn't this what you would want ?
MAE why would I use this vs RMSE. I guess it is more intuitive..

But how can I choose which metric? I don't know why you wouldn't want the model to penalize large deviations as RMSE does..

score 2 · Answer 1 · answered Feb 04 '21 at 22:41

One way to think about the difference between these two different metrics of model performance is to think about what the ideal fitted model corresponds to in the population. Let me start with MSE (which RMSE is just a monotone transformation of MSE, minimizing MSE is the same as minimizing RMSE, and this saves on notation).

For some predictor $m(X)$ of $Y$ which uses covariates $X$, we have

$$\begin{aligned}\mathbb E\left[(Y - m(X))^2\right] &= \mathbb E\left[\mathbb E\left[(Y - m(X))^2 | X\right] \right]\\ &= \mathbb E\left[\mathbb E\left[(Y - \mathbb E[Y | X])^2 + (\mathbb E[Y|X] - m(X))^2 | X\right] \right]\\ &\end{aligned} $$ In this decomposition, we see that the first term does not depend on $m(X)$ while the second expression is non-negative and therefore is minimized when we have exactly that $m(X) = \mathbb E[Y|X]$. Thus, given a sufficiently rich model and sufficiently large dataset, we should expect that an MSE minimizing regression is fitting the conditional expectation function.

While it is a bit more tedious to show, it is also the case that the model which minimizes the MAE will be attempting to fit the conditional median function.

Given the above characterization of these two error metrics, we can now think of the relative merits of these two error metrics in terms of the relative merits of expected values vs medians. For example, one reason to prefer to work with means is that they are linear, and linear objects are typically easier to work with. For example, $\mathbb E[Y_1 | X] + \mathbb E[Y_2 | X] = \mathbb E[Y_1 + Y_2 | X]$, while the equivalent statement does not in general hold for conditional medians. On the other hand, the median, because it does not penalize large deviations quite as much, is much less sensitive to outliers. For example, suppose that a small minority of observations (say, $1\%$) are miscoded as -1 trillion. Then even though they are a small proportion of the dataset, they will completely swamp the estimation. On the other hand, the median will barely move as a result of these miscoded observations.

Thank you so RMSE is better. Can I ask why anybody wouldn't want the metric to be sensitive to outliers..( in case of using MAE)? — mathella, Feb 05 '21 at 12:23
I think my example at the end speaks to that. That is obviously an extreme case, but the general point is that sensitivity to outliers can make results somewhat meaningless if data quality is poor. — stats_model, Feb 05 '21 at 17:05
There is a field of statistics that tries to formalize some of these desirable robustness properties (or lack therof) of different statistics: https://en.wikipedia.org/wiki/Robust_statistics — stats_model, Feb 05 '21 at 17:27

Choosing metric for regression problem RMSE or MAE

1 Answers1