I want to do step by step Cross-Validation using glmnet. For this I must have a vector with all variables from model.matrix.
For example, one of the R examples for model.matrix are:
dd <- data.frame(a = gl(3,4), b = gl(4,1,12))
options("contrasts")
model.matrix(~ a + b, dd, contrasts = list(a = "contr.sum"))
This gives:
(Intercept) a1 a2 b2 b3 b4
1 1 1 0 0 0 0
2 1 1 0 1 0 0
3 1 1 0 0 1 0
4 1 1 0 0 0 1
5 1 0 1 0 0 0
6 1 0 1 1 0 0
7 1 0 1 0 1 0
8 1 0 1 0 0 1
9 1 -1 -1 0 0 0
10 1 -1 -1 1 0 0
11 1 -1 -1 0 1 0
12 1 -1 -1 0 0 1
But there are in total 3 variables of a and 4 variables of b. I also want these other variables a3 and b1. Because I will use this later for calculating the coefficients for a certain lambda:
x=model.matrix(~ a + b, dd, contrasts = list(a = "contr.sum"))
y=sample(0:1,12,replace=TRUE)
coef(glmnet(x,y,family="binomial"),s=0.01)
(Intercept) 2.900983
(Intercept) .
a1 .
a2 .
b2 -2.128341
b3 -2.128577
b4 -2.128798
Questions:
1.Why are a3 and b1 not included in above variables?
2.Is it possible to include a3 and b1 in the designmatrix, such that these will also appear in the coefficients?
3.And if 2. is possible, is it then also a mathematically correct thing to do?