1

I think the answer to this question should be simple but I have searched the internet for a while and not found anything.

I have a set of categorical attributes and a continuous target variable. My aim is to understand the attribute importance. I'm not that familiar with statistical methods, but know Machine Learning so thought of building a Random Forest. However it's complex and I think perhaps there is a better way.

I saw this post which seemed to suggest ANOVA but I want to do more than reject a null hypothesis.

I had a look at the scikitlearn cheat sheet which seemed to point to Gradient Descent. I looked into this but couldn't see any examples of using categorical variables. Is that possible and if so could someone point me in the direction of an example?

  • Feature importance is a slippery concept, and borders on being non-statistical. The "feature importance" from a random forest is just a name someone gave to a diagnostic of a model, you shouldn't take the name too seriously. Can you answer this question: if you knew what features were "most important", what would you hope to do with that information? What decisions would it help aid? In broad strokes, what does "feature importance" mean to you? – Matthew Drury Sep 05 '17 at 01:35
  • In this scenario, someone else is going to do something with this. Each feature is an attribute of product sales, we want to know how each feature affects sales. – soundofsilence Sep 05 '17 at 06:10
  • In general I agree with @MatthewDrury but would like to add that the "feature importance" might be useful in creating new hypothesis and to measure/engineer new features that might improve predictions. – Krrr Sep 05 '17 at 09:30

0 Answers0