Questions tagged [stepwise-regression]

Stepwise regression (often called forward or backward regression) involves fitting a regression model and adding or removing predictors based on $t$ statistics, $R^2$ or information criteria to arrive in a *stepwise* manner at a final model. This tag can also be used for forward selection, backward elimination & best subsets variable selection strategies.

Note that performing inferential statistics via $p$-values after stepwise regression is invalid unless the $p$-values have been adjusted to account for the stepwise model building step.

295 questions
228
votes
8 answers

Algorithms for automatic model selection

I would like to implement an algorithm for automatic model selection. I am thinking of doing stepwise regression but anything will do (it has to be based on linear regressions though). My problem is that I am unable to find a methodology, or an…
S4M
  • 2,432
  • 3
  • 13
  • 6
87
votes
5 answers

What are modern, easily used alternatives to stepwise regression?

I have a dataset with around 30 independent variables and would like to construct a generalized linear model (GLM) to explore the relationship between them and the dependent variable. I am aware that the method I was taught for this situation,…
31
votes
5 answers

Detecting significant predictors out of many independent variables

In a dataset of two non-overlapping populations (patients & healthy, total $n=60$) I would like to find (out of $300$ independent variables) significant predictors for a continuous dependent variable. Correlation between predictors is present. I am…
29
votes
2 answers

Why are p-values misleading after performing a stepwise selection?

Let's consider for example a linear regression model. I heard that, in data mining, after performing a stepwise selection based on the AIC criterion, it is misleading to look at the p-values to test the null hypothesis that each true regression…
25
votes
3 answers

AIC or p-value: which one to choose for model selection?

I'm brand new to this R thing but am unsure which model to select. I did a stepwise forward regression selecting each variable based on the lowest AIC. I came up with 3 models that I'm unsure which is the "best". Model 1: Var1 (p=0.03)…
MEL
  • 275
  • 1
  • 4
  • 6
24
votes
1 answer

Stepwise AIC - Does there exist controversy surrounding this topic?

I've read countless posts on this site that are incredibly against the use of stepwise selection of variables using any sort of criterion whether it be p-values based, AIC, BIC, etc. I understand why these procedures are in general, quite poor for…
21
votes
2 answers

Does LASSO suffer from the same problems stepwise regression does?

Stepwise algorithmic variable-selection methods tend to select for models which bias more or less every estimate in regression models ($\beta$s and their SEs, p-values, F statistics, etc.), and are about as likely to exclude true predictors as…
21
votes
1 answer

Howlers caused by using stepwise regression

I am well aware of the problems of stepwise/forward/backward selection in regression models. There are numerous cases of researchers denouncing the methods and pointing to better alternatives. I was curious if there are any stories that exist…
probabilityislogic
  • 22,555
  • 4
  • 76
  • 97
20
votes
2 answers

Estimating R-squared and statistical significance from penalized regression model

I am using the R package penalized to obtain shrunken estimates of coefficients for a dataset where I have lots of predictors and little knowledge of which ones are important. After I've picked tuning parameters L1 and L2 and I'm satisfied with my…
Stephen Turner
  • 4,183
  • 8
  • 27
  • 33
18
votes
4 answers

Is Fig 3.6 in Elements of Statistical Learning correct?

Here is the figure from the textbook: It shows a decreasing relationship between subset size $k$ and mean squared error (MSE) of the true parameters, $\beta$ and the estimates $\hat{\beta}(k)$. Clearly, this shouldn't be the case - adding more…
18
votes
2 answers

Interpreting the drop1 output in R

In R, the drop1command outputs something neat. These two commands should get you some output: example(step)#-> swiss drop1(lm1, test="F") Mine looks like this: > drop1(lm1, test="F") Single term deletions Model: Fertility ~ Agriculture +…
gakera
  • 481
  • 3
  • 5
  • 11
18
votes
2 answers

Are there any circumstances where stepwise regression should be used?

Stepwise regression had been overused in many biomedical papers in the past but this appears to be improving with better education of its many issues. Many older reviewers however do still ask for it. What are the circumstances where stepwise…
17
votes
2 answers

Superiority of LASSO over forward selection/backward elimination in terms of the cross validation prediction error of the model

I obtained three reduced models from a original full model using forward selection backward elimination L1 penalization technique (LASSO) For the models obtained using forward selection/backward elimination, I obtained the cross validated…
15
votes
2 answers

Sane stepwise regression?

Suppose I want to build a binary classifier. I have several thousand features and only a few 10s of samples. From domain knowledge, I have a good reason to believe that the class label can be accurately predicted using only a few features, but I…
dsimcha
  • 7,375
  • 7
  • 32
  • 29
15
votes
2 answers

LASSO/LARS vs general to specific (GETS) method

I have been wondering, why are LASSO and LARS model selection methods so popular even though they are basically just variations of step-wise forward selection (and thus suffer from path dependency)? Similarly, why are General to Specific (GETS)…
1
2 3
19 20