2

Let's say that I am fitting a logistic regression model for a binary outcome and I have two covariates: $x_1$ and $x_2$ (both quantitative).

I am confused as to what the correct course of action would be if $x_1$ is statistically insignificant, but upon deleting $x_1$ the estimate of $x_2$ changes dramatically. Obviously, if the estimate of $x_2$ changes dramatically, then it is due to the fact that $x_1$ and $x_2$ are correlated, but my question is this:

Should I keep $x_1$ in the model since it is a confounder of $x_2$ or should I delete $x_1$ because it is insignificant?

shadowtalker
  • 11,395
  • 3
  • 49
  • 109
mmmmmmmmmm
  • 737
  • 1
  • 6
  • 15
  • 1
    What are your goals? Are you interested in identifying important predictors, are you interested in accurately predicting the outcome, or are you interested in accurately estimating the coefficient on $x_1$? – shadowtalker May 02 '15 at 15:16

1 Answers1

2

In general, you should not eliminate insignificant (I prefer the term "nonsignificant") regressors if the other coefficients change when you remove them. The regression coefficient on $x_1$ is an estimate of the change in the outcome associated with a unit change in the $x_1$, conditional on the level of $x_2$. Removing $x_2$ changes this association from one that is conditioned on $x_2$ to one that is marginal with respect to $x_2$. That is, the coefficient estimated without $x_2$ in the model is the coefficient estimated by averaging over $x_2$.

I can think of three reasons to ever eliminate a predictor from a model:

  1. You are specifically interested in whether that coefficient is zero.
  2. You are not specifically interested in whether the coefficient is zero, but have so many predictors that your estimates of the other regression coefficients (including the ones you do care about) are too imprecise for your purposes. Then removing many predictors, or predictors with bizarre marginal distributions that are causing numerical precision issues in your fitting algorithm, can improve that situation.
  3. You are specifically interested in making accurate out-of-sample predictions and therefore you're concerned about overfitting.
  4. You are specifically interested in the marginal and not conditional estimate of association.

You are not in scenario 1 and probably not in scenario 2. Whether you're in scenario 4 is up to you, but I would strongly recommend against it. You haven't stated your goals so I can't help you decide about scenario 3, although if you are in that scenario there are more purpose-built alternatives to hypothesis testing.

And that doesn't even touch the issue of whether you should be running hypothesis tests in the first place.

shadowtalker
  • 11,395
  • 3
  • 49
  • 109