1

I am trying to do some feature selection in gene expression data with 22215 features. I followed the tutorial here.

I initially applied filter method(ttest) to select the features having the best p values. I selected 100 features from them initially. Then I tried to apply sequential feature selection method on them with SVM classifier. However, when I do

[fs1, history] = sequentialfs(@SVM_class_fun, reducedL, yS1, 'cv', c);

it always returned me the 1st feature only. I mean in fs1 every other feature except the first one is 0. If I try to force it to give me 10 features with

[fs1, history] = sequentialfs(@SVM_class_fun, reducedL, yS1, 'cv', c, 'nfeatures', 10);

Here is my SVM_class_fun

function err = SVM_class_fun(xTrain, yTrain, xTest, yTest)
  model = svmtrain(xTrain, yTrain, 'Kernel_Function', 'rbf', 'boxconstraint', 10);
  err = sum(svmclassify(model, xTest) ~= yTest);
end

it will give me the first 10 selected by the filter method having lowest p values.

So this mean using sequentialfs is not helpful in this case.

To let you know I have just 12 examples. So my data matrix is of dimension 12x22215. Might this be the issue?

Can anyone provide some insights?

user34790
  • 6,049
  • 6
  • 42
  • 64
  • The fact that you are using a boxconstraint value of 10 is probably causing an issue. Try a value of 1. Otherwise your code seems fine. Wrapper feature selection on such a wide data set is always going to be difficult. – BGreene Mar 26 '13 at 15:56
  • Also doing feature selection on the whole data set prior to classification can heavily bias results. See similar question here: http://stats.stackexchange.com/questions/27750/feature-selection-and-cross-validation?rq=1 – BGreene Mar 26 '13 at 15:57

1 Answers1

3

There is a problem with the ordering of xTrain, yTrain, ... called by sequentialfs. It is acually xTrain, xTest, yTrain, yTest. Doing so fixed my problems.

gung - Reinstate Monica
  • 132,789
  • 81
  • 357
  • 650