Questions tagged [r-squared]

The coefficient of determination, usually symbolized by $R^2$, is the proportion of the total response variance explained by a regression model. Can also be used for various pseudo R-squared proposed, for instance for logistic regression (and other models.)

The coefficient of determination, usually symbolized by $R^2$, is the proportion of the total response variance explained by a regression model. In the case of simple linear regression it is the square of the Pearson product-moment correlation coefficient between the predictor and response variables. It is equivalently calculated as:

$$ R^2 = \frac{SS_{\rm total} - SS_{\rm resid}}{SS_{\rm total}} $$

$R^2$ tends to increase (i.e., look better) when variables are added to a multiple regression model, even if those variables are irrelevant. To counteract this, an adjusted $R^2$ statistic has been developed:

$$ R^2_\text{adj} = 1-(1-R^2)\frac{N-1}{N-p-1} $$

$R^2$ in the form given above is appropriate for models with normally distributed errors. It is not appropriate for other models, such as logistic regression. A variety of 'pseudo-$R^2$' statistics have been developed to provide similar information outside the context of linear models.

940 questions
269
votes
6 answers

Is $R^2$ useful or dangerous?

I was skimming through some lecture notes by Cosma Shalizi (in particular, section 2.1.1 of the second lecture), and was reminded that you can get very low $R^2$ even when you have a completely linear model. To paraphrase Shalizi's example: suppose…
raegtin
  • 9,090
  • 12
  • 48
  • 53
160
votes
9 answers

When is it ok to remove the intercept in a linear regression model?

I am running linear regression models and wondering what the conditions are for removing the intercept term. In comparing results from two different regressions where one has the intercept and the other does not, I notice that the $R^2$ of the…
analyticsPierce
  • 1,793
  • 3
  • 12
  • 6
147
votes
3 answers

When is R squared negative?

My understanding is that $R^2$ cannot be negative as it is the square of R. However I ran a simple linear regression in SPSS with a single independent variable and a dependent variable. My SPSS output give me a negative value for $R^2$. If I was to…
Anne
  • 1,967
  • 6
  • 17
  • 13
128
votes
2 answers

Removal of statistically significant intercept term increases $R^2$ in linear model

In a simple linear model with a single explanatory variable, $\alpha_i = \beta_0 + \beta_1 \delta_i + \epsilon_i$ I find that removing the intercept term improves the fit greatly (value of $R^2$ goes from 0.3 to 0.9). However, the intercept term…
Ernest A
  • 2,062
  • 3
  • 17
  • 16
68
votes
8 answers

Which pseudo-$R^2$ measure is the one to report for logistic regression (Cox & Snell or Nagelkerke)?

I have SPSS output for a logistic regression model. The output reports two measures for the model fit, Cox & Snell and Nagelkerke. So as a rule of thumb, which of these $R^²$ measures would you report as the model fit? Or, which of these fit indices…
Henrik
  • 13,314
  • 9
  • 63
  • 123
51
votes
5 answers

Relationship between $R^2$ and correlation coefficient

Let's say I have two 1-dimensional arrays, $a_1$ and $a_2$. Each contains 100 data points. $a_1$ is the actual data, and $a_2$ is the model prediction. In this case, the $R^2$ value would be: $$ R^2 = 1 - \frac{SS_{res}}{SS_{tot}}…
Shawn Wang
  • 1,245
  • 3
  • 12
  • 12
44
votes
2 answers

What is the adjusted R-squared formula in lm in R and how should it be interpreted?

What is the exact formula used in R lm() for the Adjusted R-squared? How can I interpret it? Adjusted r-squared formulas There seem to exist several formulas to calculate Adjusted R-squared. Wherry’s formula:…
user1272262
43
votes
1 answer

Manually calculated $R^2$ doesn't match up with randomForest() $R^2$ for testing new data

I know this is a fairly specific R question, but I may be thinking about proportion variance explained, $R^2$, incorrectly. Here goes. I'm trying to use the R package randomForest. I have some training data and testing data. When I fit a random…
Stephen Turner
  • 4,183
  • 8
  • 27
  • 33
41
votes
1 answer

What is the difference between "coefficient of determination" and "mean squared error"?

For regression problem, I have seen people use "coefficient of determination" (a.k.a R squared) to perform model selection, e.g., finding the appropriate penalty coefficient for regularization. However, it is also common to use "mean squared…
dolaameng
  • 513
  • 1
  • 5
  • 5
30
votes
1 answer

Is there any difference between $r^2$ and $R^2$?

The correlation coefficient is usually written with a capital $R$ but sometimes not. I wonder if there really is a difference between $r^2$ and $R^2$? Can $r$ mean something else than a correlation coefficient?
DJack
  • 517
  • 1
  • 6
  • 15
30
votes
2 answers

What is the distribution of $R^2$ in linear regression under the null hypothesis? Why is its mode not at zero when $k>3$?

What is the distribution of the coefficient of determination, or R squared, $R^2$, in linear univariate multiple regression under the null hypothesis $H_0:\beta=0$? How does it depend on the number of predictors $k$ and number of samples $n>k$? Is…
amoeba
  • 93,463
  • 28
  • 275
  • 317
30
votes
1 answer

Geometric interpretation of multiple correlation coefficient $R$ and coefficient of determination $R^2$

I am interested in the geometric meaning of the multiple correlation $R$ and coefficient of determination $R^2$ in the regression $y_i = \beta_1 + \beta_2 x_{2,i} + \dots + \beta_k x_{k,i} + \epsilon_i $, or in vector notation, $$\mathbf{y} =…
Silverfish
  • 20,678
  • 23
  • 92
  • 180
29
votes
4 answers

Pseudo R squared formula for GLMs

I found a formula for pseudo $R^2$ in the book Extending the Linear Model with R, Julian J. Faraway (p. 59). $$1-\frac{\text{ResidualDeviance}}{\text{NullDeviance}}$$. Is this a common formula for pseudo $R^2$ for GLMs?
MarkDollar
  • 5,575
  • 14
  • 44
  • 60
28
votes
4 answers

Importance of predictors in multiple regression: Partial $R^2$ vs. standardized coefficients

I am wondering what the exact relationship between partial $R^2$ and coefficients in a linear model is and whether I should use only one or both to illustrate the importance and influence of factors. As far as I know, with summary I get estimates of…
27
votes
4 answers

What does negative R-squared mean?

Let's say I have some data, and then I fit the data with a model (a non-linear regression). Then I calculate the R-squared ($R^2$). When R-squared is negative, what does that mean? Does that mean my model is bad? I know the range of $R^2$ can be…
RockTheStar
  • 11,277
  • 31
  • 63
  • 89
1
2 3
62 63