0

I have 13 features in a classification task and I use Random Forest, L1 logistic regression and L2 logistic regression for as separate classifiers and would like to compare their performance. Although they have similar performances, when I look at the feature importance from Random Forest and logistic regression (based on coefficients), they have slight difference, particularly the best feature. I am only interested in the best 3 feature and in all the 3 classifiers, these 3 are the same, but the first (best of the best) is different. Can you explain me if this is not undesirable, and if possible explain why this can happen. Thank you.

ssm
  • 165
  • 1
  • 5
  • 1
    related: http://stats.stackexchange.com/questions/164048/can-random-forest-be-used-for-feature-selection-in-multiple-linear-regression/164068#164068 – Sycorax Mar 25 '16 at 01:15

1 Answers1

2

There's no reason at all to believe any feature importance would coincide among algorithms so apart. Random Forest importance is based on expected decrease in performance when said predictor is used in a tree. GLM importance is based on the scale of coefficients.

Firebug
  • 15,262
  • 5
  • 60
  • 127