I am having difficulty setting up this model matrix. I have been looking through some tutorials and questions online but I can't seem to find the answer to my problem. I am at the point where I have decided I probably have no idea what I am doing. Here is my sample table:
structure(list(condition = structure(c(2L, 2L, 2L, 2L, 3L, 3L,
3L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("Ctrl", "Drug1",
"Drug1_Drug2"), class = "factor"), genotype = structure(c(2L,
2L, 2L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L), .Label = c("KO",
"WT"), class = "factor"), replicate = c(1, 2, 3, 1, 1, 2, 3,
1, 2, 3, 1, 2, 3, 1, 2, 3)), .Names = c("condition", "genotype",
"replicate"), row.names = c("Drug1_WT1", "Drug1_WT2", "Drug1_WT3",
"Drug1_KO1", "Drug1_Drug2_WT1", "Drug1_Drug2_WT2", "Drug1_Drug2_WT3",
"Drug1_Drug2_KO1", "Drug1_Drug2_KO2", "Drug1_Drug2_KO3", "Ctrl_WT1",
"Ctrl_WT2", "Ctrl_WT3", "Ctrl_KO1", "Ctrl_KO2", "Ctrl_KO3"), class = "data.frame")
Which looks like this when formatted:
condition genotype replicate
Drug1_WT1 Drug1 WT 1
Drug1_WT2 Drug1 WT 2
Drug1_WT3 Drug1 WT 3
Drug1_KO1 Drug1 KO 1
Drug1_Drug2_WT1 Drug1_Drug2 WT 1
Drug1_Drug2_WT2 Drug1_Drug2 WT 2
Drug1_Drug2_WT3 Drug1_Drug2 WT 3
Drug1_Drug2_KO1 Drug1_Drug2 KO 1
Drug1_Drug2_KO2 Drug1_Drug2 KO 2
Drug1_Drug2_KO3 Drug1_Drug2 KO 3
Ctrl_WT1 Ctrl WT 1
Ctrl_WT2 Ctrl WT 2
Ctrl_WT3 Ctrl WT 3
Ctrl_KO1 Ctrl KO 1
Ctrl_KO2 Ctrl KO 2
Ctrl_KO3 Ctrl KO 3
Everything in the genotype column labeled "WT" is my signal and everything labeled "KO" is my background. So what I essentially want is a way to make the following comparisons where the background (KO) is separated from the signal (WT). I am feeding my model matrix to an R package that will do the rest, I just need to make sure it is set up correctly.
- Drug1WT v. CtrlWT (with Drug1KO and Ctrl1KO as background)
- Drug1_Drug2WT v. CtrlWT (with Drug1_Drug2KO and CtrlKO as background)
- Drug1WT v. Drug1_Drug2WT (with Drug1KO and Drug1_Drug2KO as background)
Additionally, I would also like to compare:
- Drug1WT v. Drug1KO
- Drug1_Drug2WT v. Drug1_Drug2KO
- Ctrl_WT v. Ctrl_KO
I am not sure if these comparisons make sense the way I have my table set up. Here is what I have tried. The sample table is assigned to dat
.
with(dat, model.matrix(~ genotype * condition))
This gives me:
(Intercept) conditionDrug1 conditionDrug1_Drug2 genotypeWT
1 1 1 0 1
2 1 1 0 1
3 1 1 0 1
4 1 1 0 0
5 1 0 1 1
6 1 0 1 1
7 1 0 1 1
8 1 0 1 0
9 1 0 1 0
10 1 0 1 0
11 1 0 0 1
12 1 0 0 1
13 1 0 0 1
14 1 0 0 0
15 1 0 0 0
16 1 0 0 0
conditionDrug1:genotypeWT conditionDrug1_Drug2:genotypeWT
1 1 0
2 1 0
3 1 0
4 0 0
5 0 1
6 0 1
7 0 1
8 0 0
9 0 0
10 0 0
11 0 0
12 0 0
13 0 0
14 0 0
15 0 0
16 0 0
attr(,"assign")
[1] 0 1 1 2 3 3
attr(,"contrasts")
attr(,"contrasts")$condition
[1] "contr.treatment"
attr(,"contrasts")$genotype
[1] "contr.treatment"
Questions:
- Why isn't "KO" included in any of the columns of the design matrix?
- How does
model.matrix()
choose which coefficients are part of the model? There are combinations I am expected that are missing. - Based on what I am trying to achieve, how do I set up contrasts to indicate what I want to compare?
Additionally, any links to tutorials or external sources that could help me out would be appreciated.