Feature selection for unknown parametric model

Question

Suppose one has about 500 points of 50 dimensional data that one knows a priori is derived from a parametric model (perhaps with some outliers). Does using this knowledge help in feature selection? I am interested in both the continuous and discrete response case, if they differ.

Another related general question I have is whether using linear models and criterion like AIC are generally useful for feature selection, or whether they depend on strong model assumptions. I have heard that it can be wise to use "non-parametric" methods of feature selection. Are there any guidelines here?

score 1 · Answer 1 · answered Dec 14 '14 at 11:27

1

For general and parametric approaches to feature selection, the following introductory paper might be helpful: http://machinelearning.wustl.edu/mlpapers/paper_files/GuyonE03.pdf. Another paper presents a rather comprehensive overview of both parametric and non-parametric techniques: http://www.psb.ugent.be/~yvsae/pdf/fssreview_Bioinformatics_2007_23_19_2507.pdf.

Earlier and more general (fundamental) papers on feature selection in machine learning include http://www.aaai.org/Papers/Symposia/Fall/1994/FS-94-02/FS94-02-034.pdf and http://sci2s.ugr.es/keel/pdf/specific/articulo/Blum_Selection_1997.pdf.

In regard to non-parametric approaches, in particular, for dimensionality reduction, I would recommend you to consider popular methods, such as principal component analysis (PCA) (http://en.wikipedia.org/wiki/Principal_component_analysis) and exploratory factor analysis (EFA) (see http://en.wikipedia.org/wiki/Factor_analysis and http://en.wikipedia.org/wiki/Exploratory_factor_analysis). Usually, EFA is preferable to PCA, when researchers are interested in discovering latent structure of data (then latent variables are usually called factors).

answered Dec 14 '14 at 11:27

Aleksandr Blekh

7,867
2
27
93

1

Just for future reference, `[name](url)` will make a link. – Andy Jones Dec 14 '14 at 17:21
1

Will PCA take into account the interaction between the response and the predictors? As I understood PCA is just analyzing the lower-dimensional subspace on which data lives. But if you do PCA on the predictors, how do you know you are not throwing away predictors that are very important to the response? – Helmut Dec 14 '14 at 17:23
@AndyJones: Thank you for the advice - I keep forgetting about that shortcut. – Aleksandr Blekh Dec 15 '14 at 00:57
@Helmut: I understand your concern and it's valid. Fortunately, there is a solution that should alleviate it. It's [principal component regression (PCR)](http://en.wikipedia.org/wiki/Principal_component_regression), a technique, where the response variable is regressed on principal components instead of original predictors. In other words, it's a combination of PCA and linear regression. The answers in [this discussion](http://stats.stackexchange.com/q/33053/31372) are very good and should help you achieve your goal. – Aleksandr Blekh Dec 15 '14 at 01:10
@Helmut: In addition to `pls` package, recommended by @Peter Flom, [plsdepot](http://cran.r-project.org/web/packages/plsdepot) R package can also be used for performing PLS regression. See examples [here](http://gastonsanchez.com/plsdepot_plsreg1.pdf) and [here](http://gastonsanchez.com/plsdepot_plsreg2.pdf). – Aleksandr Blekh Dec 15 '14 at 01:23

Feature selection for unknown parametric model

1 Answers1