1

I saw a simulation in an answer to another question where a model that included only an interaction resulted in a design matrix that is rank deficient. I know that fitting a model with an interaction only is often not a good idea.

set.seed(15)
dt <- expand.grid(sex = c("male", "female"), hand = c("left","right"), reps = 1:10)

X <- model.matrix(~ sex*hand, data = dt)
dt$Y <- X %*% c(0, 0, 0, 5) + rnorm(nrow(dt))
lm(Y ~ sex:hand, dt) %>% summary()
Coefficients: (1 not defined because of singularities)
                  Estimate Std. Error    t value Pr(>|t|)   
(Intercept)           4.9137     0.2699   18.20  < 2e-16 ***
sexmale:handleft     -4.4457     0.3817  -11.65 9.10e-14 ***
sexfemale:handleft   -4.6311     0.3817  -12.13 2.80e-14 ***
sexmale:handright    -4.8112     0.3817  -12.60 9.14e-15 ***
sexfemale:handright       NA         NA      NA       NA    

> X <- model.matrix(~ sex:hand, data = dt)
> library(Matrix)
> rankMatrix(X)
[1] 4
> ncol(X)
[1] 5

I found this to happen in similar simulations that I did.

Why does this happen ? I am guessing that it's got something to do with, when you know the sum of $n$ variables, you only need $n-1$ of them to know the value of the $n$th, however I can't quite work it out in terms of the design matrix.

LeelaSella
  • 1,770
  • 3
  • 24
  • 42
  • See the accepted answer to https://stackoverflow.com/questions/40729701/how-to-use-formula-in-r-to-exclude-main-effect-but-retain-interaction – Sergio Aug 31 '20 at 15:05

0 Answers0