12

I used the following R code to fit a probit model:

p1 <- glm(natijeh ~ ., family=binomial(probit), data=data1)
stepwise(p1, direction='backward/forward', criterion='BIC')

I want to know what does stepwise and backward/forward do exactly and how select the variables?

Mahmoud
  • 383
  • 1
  • 2
  • 14
  • 7
    Some comments by Frank Harrell (http://stats.stackexchange.com/users/4253/frank-harrell) on why stepwise regression is bad: http://www.stata.com/support/faqs/statistics/stepwise-regression-problems/ –  Sep 07 '13 at 17:24
  • 4
    In addition to BabakP's links, have also a look at [this post](http://stats.stackexchange.com/questions/20836/algorithms-for-automatic-model-selection) from the site. – COOLSerdash Sep 07 '13 at 18:32
  • 3
    Yet another post about problems with stepwise (and backward and forward as well) is a paper I wrote with David Cassell: [Stopping Stepwise](http://www.nesug.org/proceedings/nesug07/sa/sa07.pdf) – Peter Flom Sep 07 '13 at 20:35
  • @PeterFlom, in order to reference this paper, I am having some problems understanding the proper citation. Could you please list it here? Thanks. – doug.numbers May 23 '14 at 22:24
  • 2
    @doug.numbers It was presented various places and published as part of conference proceedings. If you Google "Flom, Cassell, Stepwise" you'll get places it was presented and you can format it however you format citations to published presentations. – Peter Flom May 23 '14 at 22:28

2 Answers2

11

Principle of stepwise selection

  1. You fit a model with all variables you wish. This is your current best model.
  2. You remove one variable (or add one, among variable not used in the current best model), and for each one, you fit the new model, and you compare them with each over and with the original one, according to BIC (or any other criterion, such as AIC). You get another "current best model".

You repeat 2. until there no reduction of BIC. You have only a local minimum of BIC, which means you may not get the best model among all possible choices of subsets of variables. But anyway, there are usually too many of them, so this is a way to optimize a bit, without too much work.

See also Stepwise regression and Model selection on Wikipedia.

5

Stepwise regression basically fits the regression model by adding/dropping covariates one at a time based on a specified criterion (in your example above the criterion would be based on the BIC).

By specifying forward you are telling R that you would like to start with the simplest model (i.e., one covariate) and then add one covariate one at a time keeping only the ones that result in an improvement to the models BIC.

By specifying backward you are telling R that you want to start with the full model (i.e., the model with all the covariates) and then drop covariates, one ata time, that result in an improvement in the BIC.

Stepwise regression can be a very dangerous statistical procedure because it is not an optimal model selection procedure. The method can lead to very poor model selection because and it does not protect you against problems such as multiple comparisons.

  • Thanks. And what about 'backward/forward' ? – Mahmoud Sep 07 '13 at 17:34
  • What do you mean what about backward/forward? –  Sep 07 '13 at 17:44
  • One of the methods of stpewise() in R is 'backward/forward'! Is it a combination of both? – Mahmoud Sep 07 '13 at 17:52
  • 2
    Oh sorry, now I understand what you are asking. Yes, if you specify both then it applies both forward and backward and chooses the one with the best criterion. –  Sep 07 '13 at 18:10