3

I am trying to calculate the $R^2$ value for a production constrained spatial interaction model, using Fotheringham and O'Kelly (1989) as my guide.

I get dramatically different values for R-Square, depending on whether I calculate it as r-square <- 1 - SSe/SSt or r-square <- cor(x, y)^2. Is this result expected? Of course, I may well be miscalculating this somewhere along the line.

I want to use r-square as a (flawed but nevertheless useful and widely understood) measure of goodness of fit, as recommended by Fotheringham & Knudsen (1987).

A reproducible example is below. I've saved my model output to a csv, to save space here.

predobs <- read.csv("http://dl.dropbox.com/u/66606821/pred_obs.csv")
sst <- sum((predobs$obs - mean(predobs$obs))^2)
sse <- sum((predobs$obs - predobs$pred)^2)
(r.square.1 <- 1 - (sse/sst))
(r.square.2 <- cor(predobs$obs, predobs$pred)^2)
fmark
  • 4,666
  • 5
  • 35
  • 51
  • 1
    Both are suggested different types of pseudo R-squared values for non-OLS regression models. Perhaps looking up literature on the different proposed pseudo R-squared values will be fruitful. – Andy W May 01 '12 at 02:59
  • Thanks @AndyW I'll take that as a "no". Before I hit the books, is there any standard nomenclature to differentiate these two measures? – fmark May 01 '12 at 04:13
  • Off the cuff I don't remember, the different pseudo R-squares are sometimes named after the people whom have suggested them. See this question for some examples and discussion for ones used for logistic regression models, http://stats.stackexchange.com/q/3559/1036. – Andy W May 01 '12 at 04:31
  • 2
    These should only be the same if you are performing a linear regression! Looking at your data, if I try `lfit – shabbychef May 01 '12 at 04:38

2 Answers2

3

As long as your Gaussian linear model contains an intercept, the R squared always equals the squared correlation between the observations and the predicted values:

> y <- runif(100)
> x <- rpois(100,5)
> w <- gl(4,25)
> 
> # first model with quantitative covariate :
> fit <- lm(y~x)
> summary(fit)$r.squared
[1] 0.01387019
> pred <- fit$fitted
> cor(y,pred)^2
[1] 0.01387019
> 
> # second model with quantitative covariates :
> fit <- lm(y~x+I(x^2))
> summary(fit)$r.squared
[1] 0.01930005
> pred <- fit$fitted
> cor(y,pred)^2
[1] 0.01930005
> 
> # model with qualitative factor :
> fit <- lm(y~w)
> summary(fit)$r.squared
[1] 0.01269687
> pred <- fit$fitted
> cor(y,pred)^2
[1] 0.01269687

This fact is sometimes called the "materialization of the R squared".

Stéphane Laurent
  • 17,425
  • 5
  • 59
  • 101
2

They are equivalent when one is performing linear regression with an intercept term.

shabbychef
  • 10,388
  • 7
  • 50
  • 93