0

An ANOVA can be described as a regression with dummy variables. You could for example calculate the sums of squares treatment in an ANOVA table using the coefficients from a linear model

> y <- rnorm(10)
> x1 <- as.factor(c(0,0,0,0,0,0,1,1,1,1))
> y.bar <- mean(y)
> f1 <- lm(y ~ x1)
> sum(((f1$coef[1]) - y.bar)^2)*6 + sum(((f1$coef[1] + f1$coef[2]) - y.bar)^2)*4
[1] 1.784887
> anova(f1)
Analysis of Variance Table

Response: y
          Df Sum Sq Mean Sq F value Pr(>F)
x1         1 1.7849  1.7849   1.596  0.242
Residuals  8 8.9470  1.1184

However, when using two or more continuous predictors

> x2 <- rnorm(10)
> x3 <- rnorm(10)
> f2 <- lm(y ~ x2 + x3)
> anova(f2)
Analysis of Variance Table

Response: y
          Df Sum Sq Mean Sq F value Pr(>F)
x2         1 0.7797 0.77970  0.5959 0.4654
x3         1 0.7934 0.79336  0.6064 0.4617
Residuals  7 9.1588 1.30841

How are sums of squares calculated and how could they be interpreted?

André.B
  • 1,290
  • 6
  • 20
Andreas
  • 5
  • 3

1 Answers1

2

https://rcompanion.org/rcompanion/d_04.html explains well, especially as to how you get a few inconsistencies, depending on type 1, 2, or 3 sums of squares when fitting interactions.

In a model like yours with only 1 way effects it's pretty easy. The sum-of-squares for each term is calculated by calculating the SSE with and without that term in the model. That difference comprises the partial SSE.

set.seed(123) ## never forget this when using rnorm
x1 <- rnorm(10)
x2 <- rnorm(10)
y <- rnorm(10)
f2 <- lm(y ~ x1 + x2)
anova(f2)

gives:

> anova(f2)
Analysis of Variance Table

Response: y
          Df Sum Sq Mean Sq F value Pr(>F)
x1         1 1.2851 1.28508  1.7246 0.2305
x2         1 1.2965 1.29652  1.7399 0.2287
Residuals  7 5.2161 0.74515 

and

f1 <- lm(y ~ x1)
sum(residuals(f1)^2)  - sum(residuals(f2)^2) 

gives

> sum(residuals(f1)^2)  - sum(residuals(f2)^2) 
[1] 1.296519

which is the same 1.2965 as is displayed in the x2, Sum Sq cell.

AdamO
  • 52,330
  • 5
  • 104
  • 209