3

R linear model function lm looks like have a bug.
with no intercept term.

I want to know this is really a bug or my mistake on something...

library(dplyr)

rm(list=ls())

m2 <- 
  lm(data=iris, Sepal.Length ~ Sepal.Width + Species - 1)

res2 <-
  iris %>%
  mutate(real = Sepal.Length) %>%
  mutate(pred = predict(m2, ., type='response')) %>%
  mutate(error = real - pred) %>%
  mutate(error.sq = error^2) %>%
  mutate(sse = sum(error.sq)) %>%
  mutate(sst = sum((real - mean(real))^2)) %>%
  mutate(r.sq = 1-sse/sst) %>%
  mutate(idx = 1:n())

m2 %>% .$coef
  Sepal.Width     Speciessetosa Speciesversicolor  Speciesvirginica 
    0.8035609         2.2513932         3.7101363         4.1982099 

m2 %>% summary %>% .$r.square
[1] 0.9946393

res2$r.sq %>% unique
[1] 0.7259066

without intercept linear model result shows different r-square value with manual calculation.

m1 <- 
  lm(data=iris, Sepal.Length ~ Sepal.Width + Species)

res1 <-
  iris %>%
  mutate(real = Sepal.Length) %>%
  mutate(pred = predict(m1, ., type='response')) %>%
  mutate(error = real - pred) %>%
  mutate(error.sq = error^2) %>%
  mutate(sse = sum(error.sq)) %>%
  mutate(sst = sum((real - mean(real))^2)) %>%
  mutate(r.sq = 1-sse/sst) %>%
  mutate(idx = 1:n())

m1 %>% .$coef
  (Intercept)       Sepal.Width Speciesversicolor  Speciesvirginica 
    2.2513932         0.8035609         1.4587431         1.9468166 

m1 %>% summary %>% .$r.square
[1] 0.7259066

res1$r.sq %>% unique
[1] 0.7259066

in contrast, with intercept linear model result looks ok

Curycu
  • 133
  • 4

1 Answers1

2

R is correct. Without intercept, SST is calculated differently.

Here is the reason and manual calculation:

Why I am getting different $R^2$ from R LM and manual calculation?

Haitao Du
  • 32,885
  • 17
  • 118
  • 213