0

I've drawn the following DAG to represent my data generating process enter image description here

I'm interested in estimating the effects of G1 and G3 on OS via a linear model. It's my understanding that assuming the DAG is correct, I can estimate the effect of G1 by conditioning on G1 alone because it's directly related to OS. To estimate G3, I should avoid conditioning on G2 because it's a mediator. I could use the following model forms:

$$Y = \beta_{G1}X_{G1}$$

$$Y = \beta_{G3}X_{G3}$$

My question regards conditioning on the other, non-mediator variables that affect OS. Should I include Smoking, Stage, Gender, Age, and Radiation in the models? They don't seem to meet the definition of confounder from this post, but I wasn't sure if including these variables could yield more accurate estimates (assuming I have sufficient samples).

Should non-confounders that still affect the outcome be included in regression models to increase precision?

Tomas Bencomo
  • 727
  • 4
  • 16
  • 2
    Yes, the precision of the estimate depends on the residual variance, so minimizing it by increasing the variance explained in your model improves precision. – Noah Feb 02 '20 at 23:05
  • Are there any common scenarios when you wouldn't include these other variables? – Tomas Bencomo Feb 03 '20 at 00:50
  • 1
    If you're DAG is wrong and the variables are actually mediators or colliders, they should not be included. Therefore, don't include those that could even possibly be affected by G1 or G3. Including very weak predictors of the outcome might slightly reduce precision, but with a large sample, this is unlikely to affect it. If the model is nonlinear (e.g., logistic), including those covariates changes the estimand, bot with linear models. – Noah Feb 03 '20 at 00:58

0 Answers0