Consider the following example. I am studying the mutation burden across three subtypes of cancer. In my dataset, I have individuals without cancer (controls) and individuals with cancer (cases); the cases are either type1 or type2 or type3. The disease variable is coded as controls, type1, type2, and type3. The mutation variable is coded as a continuous variable, with the values ranging from 0 to 5. Then, I have three covariates to adjust for in my analysis. I know already that cases, in general, have a significantly higher number of mutations compared to controls. I'd like to test if there are differences in the mutation burden across the subtypes. I'd like to test this in a single regression, rather than comparing each subtype against controls in separate regressions.
I have two regression approaches (M1 and M2) as shown below.
In the first approach, I code the disease as a multifactorial predictor variable and the mutation burden as the outcome variabe. This approach enables me to perform pairwise comparisons using glht
function from multcomp
package.
myData$disease = relevel(myData$disease, ref = "controls")
M1 <- glm(mutation ~ disease+COV1+COV2+COV3, data=myData, family=gaussian)
Then, I do pairwise comparisons between the subtypes.
library(multcomp)
glht(M1,mcp(disease="Tukey"))
In the second approach, I code the disease variable as multinominal outcome variable and perform a multinomial regression using multinom
function from nnet
package.
library(nnet)
M2 <- multinom(disease~mutation+COV1+COV2+COV3, data=myData)
However, in the second approach, I don't know how to do pairwise comparisons across subtypes as I did in the M1 model.
My questions: Which one is appropriate, M1, or M2? How the interpretations of the coefficients differ between M1 and M2 ? Is it possible to do a pairwise comparison in the M2 model ?