I am trying to fit a logistic model to create propensity scores. Looking though the literature, there appears to be some disagreement on which covariates to include when designing such a model. Some say that all covariates that affect both treatment group and outcome should be included. Others advocate including only variables that predict treatment assignment, etc.
When choosing a model, what is our primary goal? Are we most interested in predicting assignment to a treatment group? Or are we most interested in balancing our sample on all covariates?
If we are most interested in prediction, I would think that it might be preferable to go through some model fitting process and to be mindful over over-fitting. However, I often see an emphasis on including all covariates as opposed to creating a model with better out-of-sample prediction.