1

According to this article on wikipedia http://en.wikipedia.org/wiki/Root-mean-square_deviation, two approaches are widely used to normalise the RMSE.

The first is dividing by the range:

$$NRMSE = \frac{RMSE}{y_{max} - y_{min}}$$

and the second by the mean:

$$CV(RMSE) = \frac{RMSE}{\bar{y}}$$

These two methods can give very different results.

What are the differences between these normalisations? and is there a preferable one?

kamilazdybal
  • 672
  • 8
  • 20
BBrill
  • 111
  • 3
  • 1
    There is even more options: http://stats.stackexchange.com/questions/131267/weka-result-interpretation/131273#131273 – Tim Mar 20 '15 at 11:52

1 Answers1

2

$CV(RMSE)$ does not seem very sensible. It depends on location rather than scale of the data, while RMSE is not likely to depend on location (at least in the simple cases like multiple regression, unless the intercept is restricted to zero). If $\bar y$ is very close to zero, $CV(RMSE)$ will be very large regardless of $RMSE$ itself.

$NRMSE$ seems more sensible, although it is clearly not robust to outliers.

I would consider using $\frac{RMSE}{\sigma_y}$ instead (what could be its name?). Scaling by standard deviation would make the measure comparable for variables in different scales.

Richard Hardy
  • 54,375
  • 10
  • 95
  • 219