Add a covariate to binary logistic regression that backward Wald removed?

Question

I have two explanatory variables A, B and two potential confounders M, N that I also want to check for interactions. So, my covariates are

A, B, M, N, A*M, A*N, B*M, B*N

Logistic Regression with Backward Wald preserved M, A*M, B*N. A stats friend said he didn't like to report interactions alone in the model, so suggestion I introduce N into the model and run LR again with just M, N, A*M, B*N and no variable selection.

What is the reasoning behind this? I had significance for B*N but when I add N to the model, I lose significance, so not sure how to report this?

Stepwise variable selection is not a valid way to build models. Carefully pre-specify your model based on the subject matter understanding, and fit one model. Form contrasts, tests, and confidence intervals from that one model. On occasion it's OK to also fit a no-interaction model, just to increase precision. Interactions require 4x larger sample sizes with regard to precision of estimates, in the best of cases. — Frank Harrell, May 09 '21 at 13:07
Thanks @FrankHarrell. My data is in the social sciences and, unfortunately, there's very little work in my area so not much understanding of the subject matter to pre-specify a model. At the same time, critics will cry foul if I don't consider enough potential confounders. So, I'm trying to tease out which variables are confounders and which are not. Isn't that what variable selection is for? I'm new to LR, so may be way off. — buttonsrtoys, May 09 '21 at 15:00
How can critics cry foul if theory does not identify specific confounders? — Alexis, May 09 '21 at 15:25
@Alexis the theory does list potential confounders but there's no data to confirm whether they actually confound. My data is the first and I was using variable selection to toss out those that did not contribute significantly. If I'm understanding the above comments, it's not correct to do that? Does that mean variable selection should never be used for any analysis? If so, what is its purpose? — buttonsrtoys, May 09 '21 at 16:15
Stepwise model building is [hooey](https://stats.stackexchange.com/questions/338804/hypothesis-testing-on-coefficients-in-two-subsets-of-data-after-stepwise-regress/338807#338807), and results in excluded true predictors, included false predictors, *p* values biased to be small, coefficients biased to be large, *F* statistics biased to be large, and all estimates are generally predicated upon a nested series of "conditional upon these variables which we excluded from the analysis." It is never ok, except to make the practitioner look scientific to the uninformed. — Alexis, May 09 '21 at 20:52
@Alexis Thanks for the links. This discussion is eye-opening for me. My study is exploratory and predictive. According to [this answer](https://stats.stackexchange.com/a/258117/125018) it may be acceptable in my case? I also read a 20x factor somewhere. I have 1,100 data samples and 11 IVs. (2 hypothesis IVs, 3 confounder IVs, and 6 interactions). So, lots of data. Does any of this make my use case more acceptable? — buttonsrtoys, May 10 '21 at 10:24
As said earlier, the idea of removing variables that don't seem to affect things is purely wishful thinking. The data do not contain the needed information to make these choices correctly. And don't try to be parsimonious when adjusting for other variables. Fit the full model. — Frank Harrell, May 10 '21 at 11:58
OK. It's finally sinking in. I'm dropping step-wise regression. Thanks! — buttonsrtoys, May 11 '21 at 09:46

Add a covariate to binary logistic regression that backward Wald removed?

0 Answers0