Bootstrapping to reduce influence of outliers

Question

I've fit a regression model and diagnostic tests are showing some values have high Cook's D as well as high DFBETAS on my parameter of interest. The effect does not cross the traditional threshold for statistical significance but it is in the predicted direction. I could just remove these influential cases and see what happens (doing this pushes the further in the predicted direction), but I wonder if leaving them in and bootstrapping the parameter to dilute the influence of any one data point would be a more conservative, less potentially arbitrary approach. Indeed, bootstrapping the parameter with 5000 resamples and calculating bias corrected and accelerated 95% confidence intervals, the CI excludes zero.

I know there are discussions on whether to exclude outliers before bootstrapping (for example), but that's not my question here exactly. I'm wondering if this is a reasonable way to approach this issue and whether my interpretation of this pattern of results is safe. Can anyone point me to something, particularly a journal article, discussing using bootstrapping in this way?

Bootstrapping theory does not imply that the bootstrap should have any value in at dealing with a small proportion of outliers. In this application, your choice of bootstrapping sounds like a rather sophisticated way to paper over the problem rather than confront it and analyse it. That leads me to suspect such journal articles might not exist--at least not in good journals. — whuber, Aug 23 '17 at 12:59
@whuber -- Thanks for your reply! I believe what you say, but I'm curious to know why it wouldn't dilute the influence any single data point. If each data point is only included in a subset of the iterations, it would seem that the "sway" of any single data point would be reduced in the bootstrapped CI. Just to help me understand, could you explain why this isn't the case? — YTD, Aug 23 '17 at 14:41
I didn't say that. Indeed, this is precisely the problem: since any one particular observation will be omitted from over a third of the bootstrap samples, the bootstrap cannot do much to help you analyze the influence of that observation. In fact, it will do just the opposite: it will magically make its influence appear to be much less than it actually might be. — whuber, Aug 23 '17 at 16:01

Bootstrapping to reduce influence of outliers

0 Answers0