0

I need to calculate the $R^2$ for a specific regression model in R, since the specific function I'm using doesn't return it. I'm trying to do so based on this excellent answer, but am still unable to replicate the results I get using lm. Below, a minimal example based on a slice of my actual data.

Note: I understand my $R^2$ may be misleadingly high, since I'm running a model without an intercept. Regardless, I can't replicate R's output even when I include the intercept.

> # A slice of my data
> df
# A tibble: 20 x 3
         y x         w
     <dbl> <chr> <dbl>
 1  0.212  c     1.44 
 2  0.419  c     0.758
 3 -0.579  c     1.38 
 4 -0.523  a     0.419
 5 -0.833  a     0.610
 6 -0.686  a     2.06 
 7 -0.173  a     1    
 8  0.0649 c     1.16 
 9 -1.04   b     0.753
10 -0.913  b     2.46 
11  0.189  a     1    
12 -0.382  c     1    
13 -0.770  b     1    
14 -0.611  b     0.392
15 -0.536  c     1    
16 -0.199  a     1    
17 -0.968  b     0.631
18 -1.89   b     1    
19 -0.511  c     0.898
20  0.581  c     0.775
> 
> # Regression without intercept
> reg <- lm(y ~ 0 + x, df, weights = w)
> summary(reg)

Call:
lm(formula = y ~ 0 + x, data = df, weights = w)

Weighted Residuals:
    Min      1Q  Median      3Q     Max 
-0.8385 -0.3569  0.1230  0.2743  0.6189 

Coefficients:
   Estimate Std. Error t value Pr(>|t|)    
xa  -0.3816     0.1729  -2.206   0.0414 *  
xb  -1.0475     0.1710  -6.125 1.12e-05 ***
xc  -0.1226     0.1472  -0.833   0.4164    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.4269 on 17 degrees of freedom
Multiple R-squared:  0.7171,    Adjusted R-squared:  0.6671 
F-statistic: 14.36 on 3 and 17 DF,  p-value: 6.481e-05

> 
> # Calculating R^2
> rss <- sum((reg$residuals * df$w)^2)
> tss <- sum((df$y * df$w - weighted.mean(df$y, df$w))^2)
> 1-rss/tss
[1] 0.6502406
> 
> 
> # I can't replicate the results even when I include the intercept
> reg2 <- lm(y ~ x, df, weights = w)
> summary(reg2)

Call:
lm(formula = y ~ x, data = df, weights = w)

Weighted Residuals:
    Min      1Q  Median      3Q     Max 
-0.8385 -0.3569  0.1230  0.2743  0.6189 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)  
(Intercept)  -0.3816     0.1729  -2.206   0.0414 *
xb           -0.6660     0.2432  -2.738   0.0140 *
xc            0.2590     0.2271   1.140   0.2699  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.4269 on 17 degrees of freedom
Multiple R-squared:  0.5034,    Adjusted R-squared:  0.445 
F-statistic: 8.616 on 2 and 17 DF,  p-value: 0.002607

> 
> # Calculating R^2
> rss <- sum((reg2$residuals * df$w)^2)
> tss <- sum((df$y * df$w - weighted.mean(df$y, df$w))^2)
> 1-rss/tss
[1] 0.6502406

Edit

As pointed out by @whuber, I had en error in my rss formula. I then went and looked at the code for summary.lm. It calculates the $r^2$ using the mean sum of squares, rather than the total sum of squares. I can, thus, replicate R's results if I use the mss, rather than the tss. What I still don't understand is that, theoretically, the TSS should be equal to RSS + MSS, but I'm not getting that. Code below:

> # I can replicate it when I use the mean sum of squares
> rss <- sum(df$w * reg$residuals^2)
> mss <- sum(df$w * reg$fitted.values^2)
> mss/(rss + mss)
[1] 0.7170557
> 
> # or, alternatively
> 1-rss/(rss+mss)
[1] 0.7170557
> 
> # What I don't understand now is why I can't use the total sum of squares 
> tss <- sum(df$w * (df$y - weighted.mean(df$y, df$w))^2)
> 1-rss/tss
[1] 0.5033989
> 
> # The TSS should be equal to RSS + MSS
> rss+mss
[1] 10.9486
> tss
[1] 6.238093
> 
  • Which package is `weighted.mean` from? – Roland Aug 28 '20 at 06:02
  • R stats, part of core R. – Lucas De Abreu Maia Aug 28 '20 at 14:36
  • Your calculation of `rss` at the end of your code is incorrect. See lines 8 and 15 of the code for `summary.lm`. – whuber Aug 28 '20 at 15:04
  • You're right. Thanks for pointing it out. And it seems that I still have an issue with the tss formula, though I can't spot it. I went and looked at the code for summary.lm. I Can now replicate R's R^2 if I use the mean sum of squares, rather than the total sum of squares. What I still don't understand is why I can't use the TSS. I've edited the question to reflect that. – Lucas De Abreu Maia Aug 28 '20 at 16:52

0 Answers0