Questions tagged [feature-selection]

Methods and principles of selecting a subset of attributes for use in further modelling

Feature selection, also called attribute selection or feature reduction, refers to techniques for identifying a subset of features of a data set that are relevant to a given problem. By removing irrelevant and redundant features, successful feature selection can avoid the curse of dimensionality and improve the performance, speed, and interpretability of subsequent models.

Feature selection includes manual methods (such those those based on domain knowledge) and automatic methods. Automatic methods are often categorized into filter, wrapper, and embedded approaches.

Filter approaches perform feature selection as a separate preprocessing step before the learning algorithm. Filter approaches thus look only at the intrinsic properties of the data. Filter methods include Wilcoxon rank sum tests and Correlation based tests.

Wrapper approaches uses performance of a learning algorithm to select features. A search algorithm is “wrapped” around the learning algorithm to ensure the space of feature subsets is adequately searched. As such, wrapper methods can be seen as conducting the model hypothesis search within the feature subset search. Examples of wrapper approaches are simulated annealing and beam search.

Embedded approaches incorporate variable selection as a part of the training process, with feature relevance obtained analytically from the objective of the learning model. Embedded methods can be seen as a search in the combined space of feature subsets and hypotheses. Examples of embedded approaches are boosting and recursive ridge regression.

2232 questions
228
votes
8 answers

Algorithms for automatic model selection

I would like to implement an algorithm for automatic model selection. I am thinking of doing stepwise regression but anything will do (it has to be based on linear regressions though). My problem is that I am unable to find a methodology, or an…
S4M
  • 2,432
  • 3
  • 13
  • 6
114
votes
4 answers

Why does the Lasso provide Variable Selection?

I've been reading Elements of Statistical Learning, and I would like to know why the Lasso provides variable selection and ridge regression doesn't. Both methods minimize the residual sum of squares and have a constraint on the possible values of…
Zhi Zhao
  • 1,352
  • 3
  • 9
  • 9
101
votes
3 answers

Feature selection and cross-validation

I have recently been reading a lot on this site (@Aniko, @Dikran Marsupial, @Erik) and elsewhere about the problem of overfitting occuring with cross validation - (Smialowski et al 2010 Bioinformatics, Hastie, Elements of statistical learning). The…
BGreene
  • 3,045
  • 4
  • 16
  • 33
90
votes
6 answers

Feature selection for "final" model when performing cross-validation in machine learning

I am getting a bit confused about feature selection and machine learning and I was wondering if you could help me out. I have a microarray dataset that is classified into two groups and has 1000s of features. My aim is to get a small number of…
83
votes
11 answers

What are disadvantages of using the lasso for variable selection for regression?

From what I know, using lasso for variable selection handles the problem of correlated inputs. Also, since it is equivalent to Least Angle Regression, it is not slow computationally. However, many people (for example people I know doing…
xuexue
  • 2,098
  • 2
  • 16
  • 11
77
votes
6 answers

Variable selection for predictive modeling really needed in 2016?

This question has been asked on CV some yrs ago, it seems worth a repost in light of 1) order of magnitude better computing technology (e.g. parallel computing, HPC etc) and 2) newer techniques, e.g. [3]. First, some context. Let's assume the goal…
71
votes
5 answers

Using principal component analysis (PCA) for feature selection

I'm new to feature selection and I was wondering how you would use PCA to perform feature selection. Does PCA compute a relative score for each input variable that you can use to filter out noninformative input variables? Basically, I want to be…
Michael
  • 2,180
  • 4
  • 23
  • 32
67
votes
3 answers

Variables are often adjusted (e.g. standardised) before making a model - when is this a good idea, and when is it a bad one?

In what circumstances would you want to, or not want to scale or standardize a variable prior to model fitting? And what are the advantages / disadvantages of scaling a variable?
62
votes
2 answers

A more definitive discussion of variable selection

Background I'm doing clinical research in medicine and have taken several statistics courses. I've never published a paper using linear/logistic regression and would like to do variable selection correctly. Interpretability is important, so no fancy…
sharper_image
  • 737
  • 7
  • 10
60
votes
5 answers

How does one interpret SVM feature weights?

I am trying to interpret the variable weights given by fitting a linear SVM. (I'm using scikit-learn): from sklearn import svm svm = svm.SVC(kernel='linear') svm.fit(features, labels) svm.coef_ I cannot find anything in the documentation that…
Austin Richardson
  • 928
  • 1
  • 8
  • 10
56
votes
4 answers

Can a random forest be used for feature selection in multiple linear regression?

Since RF can handle non-linearity but can't provide coefficients, would it be wise to use random forest to gather the most important features and then plug those features into a multiple linear regression model in order to obtain their coefficients?…
47
votes
7 answers

Features for time series classification

I consider the problem of (multiclass) classification based on time series of variable length $T$, that is, to find a function $$f(X_T) = y \in [1..K]\\ \text{for } X_T = (x_1, \dots, x_T)\\ \text{with } x_t \in \mathbb{R}^d ~,$$ via a global…
Emile
  • 3,150
  • 2
  • 20
  • 17
47
votes
7 answers

Choosing variables to include in a multiple linear regression model

I am currently working to build a model using a multiple linear regression. After fiddling around with my model, I am unsure how to best determine which variables to keep and which to remove. My model started with 10 predictors for the DV. When…
44
votes
5 answers

Using LASSO from lars (or glmnet) package in R for variable selection

Sorry if this question comes across a little basic. I am looking to use LASSO variable selection for a multiple linear regression model in R. I have 15 predictors, one of which is categorical(will that cause a problem?). After setting my $x$ and $y$…
James
  • 441
  • 1
  • 5
  • 4
41
votes
4 answers

How can SVM 'find' an infinite feature space where linear separation is always possible?

What is the intuition behind the fact that an SVM with a Gaussian Kernel has infinite dimensional feature space?
user36162
  • 551
  • 1
  • 5
  • 4
1
2 3
99 100