Stepwise binary logit regression - help for bootstrapping in Stata

Question

I am running a stepwise binary logit regression in Stata using 14 independent variables. Two of the independent variables are dummies (assuming a value of 0 or 1). I've tested the independent variables for multicollinearity and adapted them by standardizing or using the natural logarithm of their values in order to mitigate this issue (VIF<2.5). The normal model runs smoothly; however, when I want to bootstrap the sample (# of observations: 73) with 1000 replications I receive p-values of 1.0000. Furthermore, the results conclude with the note: "one or more parameters could not be estimated in 314 bootstrap replicates; standard-error estimates include only complete replications."

Two questions: 1. Is the VIF threshold that I used correct (VIF<2.5)? Which other ways are there to get rid of multicollinearity, without dropping one of the variables? 2. Since I don't assume that multicollinearity is an issue anymore, what else could I have done wrong concerning my bootstraping methodology?

Many thanks in advance for your answer(s)!

Best! Tim

Your approach is not honest about the number of parameters estimated. The transformation estimation process needs to be part of the bootstrap as does every other modeling step that utilized $Y$. Collinearity on the other handle, can often ignore $Y$ and can be dealt with pre-outcome modeling. There is no need to compute $P$-values using the bootstrap as you already have those from the original model fit. — Frank Harrell, May 13 '14 at 12:43
Frank, thanks a lot for your quick reply. To put it into the words of a layperson: this means that I do not need to bootstrap my sample? Isn't the initial sample size of 73 too small to receive appropriate results? Furthermore, what do you mean by "not honest"? That the transformations I chose are not consistent with each other? Unfortunately, the issue of multicollinearity appears when I use a consistent approach. — Tim, May 13 '14 at 13:04
You are effectively estimating several more parameters when you try different transformations. You need to let the bootstrap repeate *from scratch* all the modeling steps each time, including examining transformations. [This is why just fitting regression splines if often a great approach. The bootstrap just refits the regression splines for each re-sample.] — Frank Harrell, May 13 '14 at 16:26
Concerning your question about $n=73$, I wouldn't expect the bootstrap to improve on the accuracy. — Frank Harrell, May 13 '14 at 17:11

score 3 · Answer 1 · answered May 13 '14 at 12:55

3

Consider not doing stepwise resgression, which is a good way to almost insure biased results:

Malek, M. H. and Coburn, D. E. B. J. W. (2007). On the inappropriateness of stepwise regression analysis for model building and testing. European Journal of Applied Physiology, 101(2):263–264.

Steyerberg, E. W., Eijkemans, M. J., and Habbema, J. D. F. (1999). Stepwise selection in small data sets: a simulation study of bias in logistic regression analysis. Journal of clinical epidemiology, 52(10):935–942.

Whittingham, M., Stephens, P., Bradbury, R., and Freckleton, R. (2006). Why do we still use stepwise modelling in ecology and behaviour? Journal of Animal Ecology, 75(5):1182–1189.

answered May 13 '14 at 12:55

Alexis

26,219
5
78
131

Alexis, thanks for your answer. However, and since I am applying backward stepwise regression, I already include all of the independent variables in my first model which yields the described results for the bootstrapped sample. – Tim May 13 '14 at 13:02
Which does not get you around the fact that you are, among other things, selecting for the most heteroscedastic predictors. – Alexis May 13 '14 at 17:02

Stepwise binary logit regression - help for bootstrapping in Stata

1 Answers1

Linked