Alternatives to stepwise discriminant analysis for feature selection on hyperspectral data

Question

I am new to R and to hyperspectral data analysis. However, in my research, I have found that many warn against using Stepwise discriminant analysis (using Wilk's Lambda or Mahalanobis distance) for finding the best subset of variables with which 'satisfactory' discrimination performance can be obtained.

I have come across some suggestions:

PLS: http://cran.r-project.org/web/packages/pls/ and

LARS: http://cran.r-project.org/web/packages/lars/index.html, and I am just realizing that maybe the answers provided to this link below might be useful:

What are modern, easily used alternatives to stepwise regression?.

Given the nature of hyperspectral data (highly correlated and highly redundant) I would like to find the first 10 bands that are most efficient at discriminating between about 30 species of plants. Any suggestions would be most valued.

Welcome to cross validated! I think we need more information about *why* you want to select 10 bands. The answers will differ whether that is for regularization or for instrumental reasons. — cbeleites unhappy with SX, Sep 23 '13 at 10:52

score 1 · Accepted Answer · answered Sep 23 '13 at 10:51

Hyperspectral data sets often are wide: many spectral channels vs. not so many independent rows, particularly as the rows in the data set are often not independent of each other (e.g. spatially resolved spectra data of few samples/cases). That's why some kind of regularization is often needed.

In addition, most hyperspectral data differ from say, microarray data, in that the spectral axis is in reality continuous, but is discretized into spectral channels (the columns). From a spectroscopic point of view, spectra have good quality if this is captured in the spectra, so spectroscopically good data will have high correlation between neighbouring columns/measurement channels.

This means that for hyperspectral data, you'd rather expect model coefficients to behave smoothly as well. I therefore prefer the regularization given by PLS over e.g. the Lasso: variable selection (shrinking coefficients to zero) doesn't seem particularly appropriate from a spectroscopic point of view.

However, if the reason for wanting to find 10 bands is that you want e.g. to build an instrument based on filters later on, then the lasso (or e.g. randomForest) and other methods that similarly shrink coefficients to zero are more appropriate.

Just to repeat what I commented in the other question: I recommend a look into The Elements of Statistical Learning.

Although it is advertising my own package and off-topic to your actual question: If you don't know it already, you may want to check out hyperSpec which I wrote to facilitate working with hyperspectral data in R.

Thanks a million for your valued comments. They have certainly helped — user2507608, Sep 28 '13 at 05:21

Alternatives to stepwise discriminant analysis for feature selection on hyperspectral data

1 Answers1