19

When Performing a linear regression in r I came across the following terms.

 NBA_test =read.csv("NBA_test.csv")
 PointsPredictions  = predict(PointsReg4, newdata =  NBA_test)
 SSE = sum((PointsPredictions - NBA_test$PTS)^2)
     SST = sum((mean(NBA$PTS) - NBA_test$PTS) ^ 2)
 R2 = 1- SSE/SST

In this case I am predicting the number of points. I understood what is meant by SSE(sum of squared errors), but what actually is SST and R square? Also what is the difference between R2 and RMSE?

user3796494
  • 623
  • 2
  • 5
  • 12

2 Answers2

19

Assume that you have $n$ observations $y_i$ and that you have an estimator that estimates the values $\hat{y}_i$.

The mean squared error is $MSE=\frac{1}{n} \sum_{i=1}^n (y_i - \hat{y}_i)^2$, the root mean squared error is the square root thus $RMSE=\sqrt{MSE}$.

The $R^2$ is equal to $R^2=1-\frac{SSE}{TSS}$ where $SSE$ is the sum of squared errors or $SSE=\sum_{i=1}^n (y_i - \hat{y}_i)^2 )$, and by definition this is equal to $SSE=n \times MSE$.

The $TSS$ is the total sum of squares and is equal to $TSS=\sum_{i=1}^n (y_i - \bar{y} )^2$, where $\bar{y}=\frac{1}n{}\sum_{i=1}^n y_i$. So $R^2=1-\frac{n \times MSE} {\sum_{i=1}^n (y_i - \bar{y} )^2}$.

For a regression with an intercept, $R^2$ is between 0 and 1, and from its definition $R^2=1-\frac{SSE}{TSS}$ we can find an interpretation: $\frac{SSE}{TSS}$ is the sum of squared errors divided by the total sum of squares, so it is the fraction ot the total sum of squares that is contained in the error term. So one minus this is the fraction of the total sum of squares that is not in the error, or $R^2$ is the fraction of the total sum of squares that is 'explained by' the regression.

The RMSE is a measure of the average deviation of the estimates from the observed values (this is what @user3796494 also said) .

For $R^2$ you can also take a look at Can the coefficient of determination $R^2$ be more than one? What is its upper bound?

  • fcop, note that the MSE and RMSE are dependent on the corrections for changes in the number of degrees of freedom between the calculation of different parameters - i.e. instead of dividing by n, one has to divide by n-k where k is the numbers of parameters fitted, including the constant - i.e. see http://seismo.berkeley.edu/~kirchner/eps_120/Toolkits/Toolkit_10.pdf just registered so I cannot add this as a comment. – yadrimz Nov 07 '15 at 14:13
  • 1
    @yadrimz: the 'usual' definition of MSE and RMSE divides by $n$, see e.g. http://www.geosci-model-dev.net/7/1247/2014/gmd-7-1247-2014.pdf bottom of page 2. The **adjusted** $R^2$ correctes for the number of independent variables, but RMSE and MSE usually do not. This is confirmed by http://math.stackexchange.com/questions/488964/the-definition-of-nmse-normalized-mean-square-error –  Nov 08 '15 at 08:27
  • 1
    the reason this has been confirmed as the 'general' case is that the number of parameters K is assumed to be equal to 0. Regardless, this is not always the case, especially in the case of linear regression as it might lead to misleading results. That is why, for example, MATLAB's implementation counts the number of parameters and takes them off the total number. I would encourage you to refer to Berkeley's, MIT's or Edinburgh's solutions of the problem. – yadrimz Nov 08 '15 at 13:52
  • @yadrimz: I will look it up, but maybe it would be better if you give an answer to the question, if it is ( in my honest opinion) a good answer then I will vote for it. –  Nov 08 '15 at 18:15
  • Could show that $RMSE = \sqrt{\frac{1-R^2}{n\times TSS}}$ – Baraliuh Jan 18 '22 at 22:29
15

Both indicate the goodness of the fit.

R-squared is conveniently scaled between 0 and 1, whereas RMSE is not scaled to any particular values. This can be good or bad; obviously R-squared can be more easily interpreted, but with RMSE we explicitly know how much our predictions deviate, on average, from the actual values in the dataset. So in a way, RMSE tells you more.

I also found this video really helpful.

user3796494
  • 623
  • 2
  • 5
  • 12
  • 2
    Note thet $R^2$ can be negative in a regression without an intercept, see http://stats.stackexchange.com/questions/164586/what-is-the-upper-bound-on-r2-not-1/164702#164702 –  Aug 27 '15 at 11:27