I have a small dataset (55 samples) described by 20 features.
I performed a SVM (RBF) approach with cross-validation on 70% of the dataset (training part) and I recorded the AUC (average) for 150 combinations of features that may have a sense for the experiment (nevertheless I had tried before feature selection but with no success).
I have very good results (near 99% AUC) for some combinations of features and bad ones 51% for example for others.
My question is which statistical approach I have to use to assess correctly which combinations are better than others ?