2

Let's say there is a continuous variable y and a grouping (factor) variable x with 3 different levels: a, b, and c.

x = sample(letters[1:3], size = 300, replace = T)
y = rnorm(300)

I want to test two contrasts: (1) y for group a versus the mean of y for group b and c and (2) y for group a versus y for group c.

Below is how I do this:

contr_mat = matrix(c(1, -0.5, -0.5,
                     1, 0, -1),
                   nrow = 3, ncol = 2)

lm(y ~ x, contrasts = list(x = contr_mat)) %>% summary

However, according to here and here, the correct way to do this is:

library(MASS)
lm(y ~ x, contrasts = list(x = ginv(t(contr_mat)))) %>% summary

which produces different results in terms of both the regression coefficients and P values. But none of the above links explains "why" it is the correct way. The ?lm documentation also does not mention anything about ginv. Could someone explain why I should take the generalized inverse (ginv) of transpose (t) of the original contrast matrix to get the correct contrast matrix?

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467

0 Answers0