Is the Decision Tree algorithm the best for supervised learning for a classificaiton problem with non-linear relationships?

Question

I have a dataset with 1000+ features and 1 mil+ rows. I have a binary target variable either yes or no and the features are all numerically values range from 0 to 100k+.

My goal is to understand which features contributed the most to each instance. My main emphasis is which features contributed to the binary target, thus interpretability is a bigger plus than accuracy.

My question is, are decision trees in sci-kit learn the best suited to interpret non-linear relationships in a classification problem?

When you say "My goal is to understand which features contributed the most", it sounds as if you seek to identify not just good predictors but causal relationships. With 1000+ features that figures to be a long, long task. It's a very interesting problem; maybe in an answer someone could describe a case in which such an analysis was done effectively. — rolando2, Mar 28 '18 at 20:47

Jakub Bartczuk · Answer 1 · 2018-03-28T15:52:33.430

6

In practice you shouldn't use decision trees, but random forests (decision trees are prone to overfitting), at least if you're interested in high classification accuracy. RFs are not the easiest methods to interpret, although there are some approaches to visualize feature importance. You could also try Gradient Boosting Trees for that and use the same method for them.

What would be probably the easiest method to interpret is logistic regression - when using LASSO regularization with them it is possible to drive irrelevant components to zero, thus giving a model which predicts an outcome only based on some subset of features (you'd have to run your model with different $\lambda$ values though).

EDIT: Oliver Angelil mentioned a very important aspect of comparing/interpreting different models, see his comment below.

edited Mar 28 '18 at 15:52

answered Mar 28 '18 at 07:44

Jakub Bartczuk

5,526
1
14
36

thank you for your input. Why do you say "In practice, you shouldn't use decision trees" ? On the other hand, the fact that RFs are not interpretable, or to an extent interpretable is a minus itself. Interpretation is what we are looking for, – Victor Mar 28 '18 at 07:47
3

I've mentioned overfitting - the fact about trees is that if you don't limit depth, or leaf size, or don't prune the tree, you're highly likely to overfit (it's not uncommon to get 100% accuracy on train set, and much lower test accuracy) – Jakub Bartczuk Mar 28 '18 at 07:50
so basically a decision with a limit depth, leaf size, and pruning makes it a better suit for a classification problem when it comes to interpretability? The fact that deploying random forest (creating 10+ decision trees) does reduce the interpretability. – Victor Mar 28 '18 at 07:52
3

+1. If you want to interpret your model, you will need to restrict yourself to a very small model, certainly not one that uses an appreciable number of the 1000 features you have. I wouldn't trust myself to interpret more than 10 features in a model, nor trust anyone else trying to sell such an interpretation to me. In Python, you can [combine a logistic regression with an L1 regularization](https://stackoverflow.com/q/41639557/452096). – Stephan Kolassa Mar 28 '18 at 08:25
3

Elastic net regularisation is another common option that combines Lasso with Ridge, which can better handle multicollinearity :P – WavesWashSands Mar 28 '18 at 10:02
3

Fyi, feature importance is relevant to the model and not the real world: https://stats.stackexchange.com/q/336404/134691 – Oliver Angelil Mar 28 '18 at 10:39

Is the Decision Tree algorithm the best for supervised learning for a classificaiton problem with non-linear relationships?

1 Answers1