I am new to machine learning and am very lost trying to deciding on features from a data set. The data set that I have has over 25000 observations and just under 500 features. I have a churn variable 0/1 where 0 is churn. I am attempting to build a classification model for churn. Does it make sense to do a random forest and then take the top 10 variables by importance from this model and use these in a logistic regression? I am hoping to use the logistic regression to make the results more interpret-able for presentation purposes.
Asked
Active
Viewed 120 times
1 Answers
0
Using top 10 variables from random forrest according to their importance seems completely valid idea to me. However, it also depends upon how the data is also how importance is distributed across different variables. I would definitely try more than one method. I am big fan of random forests and they can even be made interpretable as discussed here.

discipulus
- 726
- 4
- 14