2

Is there a way that we can find an optimum tree (highly accurate) from a random forest?

The purpose is to run some samples manually through the optimum tree and see how the tree classify the given sample.

I am using Scikit-learn for data analysis and my model has ~100 trees. Is it possible to find out an optimum tree and run some samples manually?

Thanks

  • Ansant Gupta provided the correct answer. The purpose of extracting a "best tree" which only uses mtry = a proportion of all variables seems unclear to me? You can also run some samples down the complete forest and get a bagged result for these samples – Björn Sep 03 '19 at 06:41

1 Answers1

3

I think what you are asking is doable but it beats the purpose of having a random forest. It is an ensemble model where results from multiple weak estimators are used for coming up with a strong estimator

However, if you want to go ahead and do it, you can do it in the following manner

  1. Choose a metric that should be used for evaluating the individual decision trees
  2. Run that metric on the same dataset for all the decision trees and find the one with the best metric
    from sklearn.metrics import accuracy_score
    from sklearn.ensemble import RandomForestClassifier
    from sklearn.datasets import make_classification
    X, y = make_classification(n_samples=1000, n_features=4,n_informative=2, n_redundant=0,random_state=0, shuffle=False)
    n_estimators=100
    clf = RandomForestClassifier(n_estimators=n_estimators, max_depth=2,random_state=0)
    clf.fit(X, y)  

    estimatorAccuracy=[]
    for curEstimator in range(n_estimators):
        estimatorAccuracy.append([curEstimator,accuracy_score(y, clf.estimators_[curEstimator].predict(X))])

    estimatorAccuracy=pd.DataFrame(estimatorAccuracy,columns=['estimatorNumber','Accuracy'])
    estimatorAccuracy.sort_values(inplace=True,by='Accuracy',ascending=False)

    bestDecisionTree= clf.estimators_[estimatorAccuracy.head(1)['estimatorNumber'].values[0]]

Anant Gupta
  • 300
  • 1
  • 3