I saw a simulation in an answer to another question where a model that included only an interaction resulted in a design matrix that is rank deficient. I know that fitting a model with an interaction only is often not a good idea.
set.seed(15)
dt <- expand.grid(sex = c("male", "female"), hand = c("left","right"), reps = 1:10)
X <- model.matrix(~ sex*hand, data = dt)
dt$Y <- X %*% c(0, 0, 0, 5) + rnorm(nrow(dt))
lm(Y ~ sex:hand, dt) %>% summary()
Coefficients: (1 not defined because of singularities)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.9137 0.2699 18.20 < 2e-16 ***
sexmale:handleft -4.4457 0.3817 -11.65 9.10e-14 ***
sexfemale:handleft -4.6311 0.3817 -12.13 2.80e-14 ***
sexmale:handright -4.8112 0.3817 -12.60 9.14e-15 ***
sexfemale:handright NA NA NA NA
> X <- model.matrix(~ sex:hand, data = dt)
> library(Matrix)
> rankMatrix(X)
[1] 4
> ncol(X)
[1] 5
I found this to happen in similar simulations that I did.
Why does this happen ? I am guessing that it's got something to do with, when you know the sum of $n$ variables, you only need $n-1$ of them to know the value of the $n$th, however I can't quite work it out in terms of the design matrix.