1

In what circumstances does the RMSE formula have a $k$ in the denominator?

StackOverflow's What does RMS stand for? shows this formula for RMSE:

$$RMSE=\sqrt{\frac1{n-k}\sum_i(y_i-\hat{y}_i)^2}$$

But most other sources don't have the "k" For example, https://www.statisticshowto.datasciencecentral.com/rmse/ or http://statweb.stanford.edu/~susan/courses/s60/split/node60.html

$$RMSE=\sqrt{\frac{\sum_{i=1}^n(\hat{y}_i-y_i)^2}{n}}$$

Siong Thye Goh
  • 6,431
  • 3
  • 17
  • 28
Robert Frank
  • 166
  • 2
  • 7
  • 1
    Hint: When $k$ is small compared to $n,$ how do the two values compare? – whuber Apr 06 '19 at 17:06
  • When k is small compared to n, it, of course, has little impact on the result. On my datasets, k often n - 1, so it does have an impact. – Robert Frank Apr 06 '19 at 21:07
  • I've been using the (n - k) denominator for years in my code. However, a little knowledge is a dangerous thing because a user is now questioning why I'm using subtracting k, because neither he (nor I) have seen k included in discussions on the web other than the first reference in my top post. Where can I find an authoritative reference to subtracting k? – Robert Frank Apr 06 '19 at 21:13
  • 4
    Any textbook on multiple regression will derive and explain it. The purpose is to obtain an unbiased estimator of the variance (not its square root!). You can find many discussions of this on our site. – whuber Apr 06 '19 at 21:15
  • I think you mean denominator. Can you please fix the question? – behold Apr 06 '19 at 23:17

3 Answers3

4

Look at wikipedia, since it's a sample and not population, you need to subtract number of variables being estimated (including constant) to remove the bias. As whuber noted, most of the time number of variables being estimated is small as compared to n, and hence some implementations might be ignoring it.

behold
  • 453
  • 3
  • 12
2

This may be confusing, as two different definitions are used depending on the context. In statistics, in the context of regression modelling we use $n-k$ in denominator. This is discussed in Wikipedia:

In regression analysis, the term mean squared error is sometimes used to refer to the unbiased estimate of error variance: the residual sum of squares divided by the number of degrees of freedom . . .

On another hand, in machine learning by RMSE people usually mean square root of averaged squared errors (i.e. $n$ in denominator). This is how it is implemented in every major machine learning software including scikit-learn, TensorFlow & others.

Tim
  • 108,699
  • 20
  • 212
  • 390
1

I just want to elaborate a bit on the answer that behold gave.

The $k$ parameter is typically used when you have limited observations, relative to the number of parameters used. As $n \rightarrow \infty$, then $k$ becomes negligible and the increase in magnitude in your error due to parameterisation becomes negligible.

In simple terms, you're "acknowledging" that you have a limited amount of data and you've used extensive parameterisation to describe your model.

Since it's really not difficult to implement the $k$ parameter, I'd say that you should always use it.

tionichm
  • 41
  • 5
  • 1
    You might be confusing this issue with the [adjusted R-squared](https://stats.stackexchange.com/search?q=adjusted+R-squared) statistic. – whuber Apr 09 '19 at 18:07
  • 1
    They are related, to some extent, I'd say. I just decided to leave out the parts of the adjusted R-squared to avoid confusion. Clearly I missed that goal. The point I was trying to make is that _like_ the adjusted R-squared, you're accounting for your biases. Maybe I'll reduce my answer even more to clear out ambiguity. Thanks man. – tionichm Apr 10 '19 at 08:35