0

I know that classification performance can be tested for significance by permutation tests which permute the class labels. I wonder now if for ranked features (ie a ranking according to the features' relevance for the classification) a similar permutation test could be used to test this ranking for significance?

Thanks

Pugl
  • 951
  • 1
  • 16
  • 40
  • Not an exact answer! I think that there is no distribution for a measure like for example NDCG. It is however possible to empirically assess the distribution using bootstrap. Using these intervals t-test like assessments can be made. See more on this topic on: http://research.nii.ac.jp/ntcir/workshop/OnlineProceedings11/pdf/EVIA/04-EVIA2014-SoboroffI.pdf – spdrnl Nov 20 '15 at 16:55
  • 1
    How do you estimate the features' relevance? weight magnitude? performance drop due to exclusion? – Trisoloriansunscreen Nov 23 '15 at 20:13
  • @Tal Yes, usually weighting for linear SVMs, and else recursive feature elimination – Pugl Nov 23 '15 at 21:44

1 Answers1

1

If you'll repeat your training with multiple (independent!) training sets, then you'll have multiple realizations of the ranking. Then it's a much more standard problem, for which you can find references here at CV.

Regarding the single classification case, this isn't exactly what you asked for, but you can compare each weight with its null distribution produced by training with shuffled labels. See http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3997040/

Having said that, you should ask yourself whether you are not over-interpreting classifier weights. Except for Naive Gaussian Bayes (in which the weights directly reflect the individual features' SNR as univariate measures would do), an individual feature's weight doesn't tell us much about the feature itself, since the weight can be understood only in the context of the all of the other features. The most evident case in which feature weight might be misleading is when a feature faithfully samples a noise source that masks signal in other features. A strong weight for this feature will reflect that it is subtracted from other features, not that it has any signal in it. See http://www.ncbi.nlm.nih.gov/pubmed/24239590

In my opinion, if you'd like to say something on individual features, use univariate methods.

Trisoloriansunscreen
  • 1,669
  • 12
  • 25