Finding and using a single (best) decision tree from random forest to evalute a sample

Question

Is there a way that we can find an optimum tree (highly accurate) from a random forest?

The purpose is to run some samples manually through the optimum tree and see how the tree classify the given sample.

I am using Scikit-learn for data analysis and my model has ~100 trees. Is it possible to find out an optimum tree and run some samples manually?

Thanks

Ansant Gupta provided the correct answer. The purpose of extracting a "best tree" which only uses mtry = a proportion of all variables seems unclear to me? You can also run some samples down the complete forest and get a bagged result for these samples — Björn, Sep 03 '19 at 06:41

score 3 · Answer 1 · answered Sep 03 '19 at 05:31

I think what you are asking is doable but it beats the purpose of having a random forest. It is an ensemble model where results from multiple weak estimators are used for coming up with a strong estimator

However, if you want to go ahead and do it, you can do it in the following manner

Choose a metric that should be used for evaluating the individual decision trees
Run that metric on the same dataset for all the decision trees and find the one with the best metric

    from sklearn.metrics import accuracy_score
    from sklearn.ensemble import RandomForestClassifier
    from sklearn.datasets import make_classification
    X, y = make_classification(n_samples=1000, n_features=4,n_informative=2, n_redundant=0,random_state=0, shuffle=False)
    n_estimators=100
    clf = RandomForestClassifier(n_estimators=n_estimators, max_depth=2,random_state=0)
    clf.fit(X, y)  

    estimatorAccuracy=[]
    for curEstimator in range(n_estimators):
        estimatorAccuracy.append([curEstimator,accuracy_score(y, clf.estimators_[curEstimator].predict(X))])

    estimatorAccuracy=pd.DataFrame(estimatorAccuracy,columns=['estimatorNumber','Accuracy'])
    estimatorAccuracy.sort_values(inplace=True,by='Accuracy',ascending=False)

    bestDecisionTree= clf.estimators_[estimatorAccuracy.head(1)['estimatorNumber'].values[0]]

Finding and using a single (best) decision tree from random forest to evalute a sample

1 Answers1