Are the outcome and predictor variables in a logistic/linear regression interchangeable?

Question

Consider the following example. I am studying the mutation burden across three subtypes of cancer. In my dataset, I have individuals without cancer (controls) and individuals with cancer (cases); the cases are either type1 or type2 or type3. The disease variable is coded as controls, type1, type2, and type3. The mutation variable is coded as a continuous variable, with the values ranging from 0 to 5. Then, I have three covariates to adjust for in my analysis. I know already that cases, in general, have a significantly higher number of mutations compared to controls. I'd like to test if there are differences in the mutation burden across the subtypes. I'd like to test this in a single regression, rather than comparing each subtype against controls in separate regressions.

I have two regression approaches (M1 and M2) as shown below.

In the first approach, I code the disease as a multifactorial predictor variable and the mutation burden as the outcome variabe. This approach enables me to perform pairwise comparisons using glht function from multcomp package.

myData$disease = relevel(myData$disease, ref = "controls")
M1 <- glm(mutation ~ disease+COV1+COV2+COV3, data=myData, family=gaussian)

Then, I do pairwise comparisons between the subtypes.

library(multcomp)
glht(M1,mcp(disease="Tukey"))

In the second approach, I code the disease variable as multinominal outcome variable and perform a multinomial regression using multinom function from nnet package.

library(nnet)
M2 <- multinom(disease~mutation+COV1+COV2+COV3, data=myData)

However, in the second approach, I don't know how to do pairwise comparisons across subtypes as I did in the M1 model.

My questions: Which one is appropriate, M1, or M2? How the interpretations of the coefficients differ between M1 and M2 ? Is it possible to do a pairwise comparison in the M2 model ?

score 1 · Accepted Answer · answered Sep 11 '20 at 20:09

No, they aren't interchangeable. It may help you to read my answer to: What is the difference between linear regression on y with x and x with y? For an overview of the case with logistic regression, it might be worth reading my answer to: Relationship between regressing Y on X, and X on Y in logistic regression. In the linear regression case, the slopes will differ, but the p-value for the relationship will be the same when there is only one X and only one Y. However, when you include covariates, the $X\rightarrow Y$ and $Y\rightarrow X$ p-values won't be the same unless the covariates are all perfectly orthogonal to both X and Y.

So which model should you use? The simplest way to think about this (although not generally correct) is to assume a causal relationship. That is, are you thinking that disease type causes mutation burden, or that the mutation burden causes the disease to be of a certain type? This is a useful heuristic, but note that your data appear to be observational (you didn't independently manipulate the disease types or the mutation burdens), so you aren't necessarily licensed to infer causality from these models. In a predictive context, you can say to yourself, 'in the future I will have data on <disease type / mutation burden> but not on <mutation burden / disease type> and I will want to use this model to make an educated guess about the true value of <mutation burden / disease type>'. In which case, you use the future unknown as the response here. More generally, regression models assume the X values are fixed and known, and that the uncertainty about the relationship is due to sampling error in Y. Thus, ask yourself whether you think the noise in the system primarily lives in X or Y and put that as the response.

Do you know if it's possible (or statistically meaningful) to do pairwise comparisons among the outcomes in the M2 model that I describe? — Veera, Sep 17 '20 at 06:55
@Veera, there aren't really pairwise comparisons in a mulitnomial. You can turn it around & fit the covariates as a function of the response w/ discriminant analysis, but it's still not quite the same thing. — gung - Reinstate Monica, Sep 17 '20 at 11:35

Are the outcome and predictor variables in a logistic/linear regression interchangeable?

1 Answers1