Let's say there is a continuous variable y
and a grouping (factor) variable x
with 3 different levels: a
, b
, and c
.
x = sample(letters[1:3], size = 300, replace = T)
y = rnorm(300)
I want to test two contrasts: (1) y
for group a
versus the mean of y
for group b
and c
and (2) y
for group a
versus y
for group c
.
Below is how I do this:
contr_mat = matrix(c(1, -0.5, -0.5,
1, 0, -1),
nrow = 3, ncol = 2)
lm(y ~ x, contrasts = list(x = contr_mat)) %>% summary
However, according to here and here, the correct way to do this is:
library(MASS)
lm(y ~ x, contrasts = list(x = ginv(t(contr_mat)))) %>% summary
which produces different results in terms of both the regression coefficients and P values. But none of the above links explains "why" it is the correct way. The ?lm
documentation also does not mention anything about ginv
. Could someone explain why I should take the generalized inverse (ginv
) of transpose (t
) of the original contrast matrix to get the correct contrast matrix?