4

Using the famous Iris data set with Julia decision tree classifier I get the following tree.

using RDatasets
using DecisionTree
iris = dataset("datasets", "iris")
features = convert(Array, iris[:, 1:4])
labels = convert(Array, iris[:, 5]);
model = build_tree(labels, features)
model = prune_tree(model, 0.9)

print_tree(model)
Feature 3, Threshold 3.0
L-> setosa : 50/50
R-> Feature 4, Threshold 1.8
    L-> Feature 3, Threshold 5.0
        L-> versicolor : 47/48
        R-> Feature 4, Threshold 1.6
            L-> virginica : 3/3
            R-> Feature 1, Threshold 7.2
                L-> versicolor : 2/2
                R-> virginica : 1/1
    R-> Feature 3, Threshold 4.9
        L-> Feature 1, Threshold 6.0
            L-> versicolor : 1/1
            R-> virginica : 2/2
        R-> virginica : 43/43

I can't really interpret the numbers after some of the branches, like "setosa : 50/50" or "virginica : 3/3".

Could somebody explain what those mean?

Istvan
  • 165
  • 5

1 Answers1

5

The labels give the number of observations with the predicted majority class and the overall number of observations in that node.

Thus, "versicolor : 47/48" means that in the corresponding node there are 48 observations out of which 47 are versicolor. Consequently, one observation (of class virginica) is misclassified in that node.

Achim Zeileis
  • 13,510
  • 1
  • 29
  • 53