2

I'm considering a leaf of a decision tree that consists of object-label pairs $(x_{1}, y_{1}), \dots, (x_{n}, y_{n})$.

The prediction $\hat{y}$ of this leaf is defined to minimize the loss on the training samples.

I have to find the optimal prediction in the leaf for a classification tree for $K$ classes, i.e. $y_{i} \in \{1, \dots, K\}$, and zero-one loss $\mathcal{L}(y, \hat{y}) = \left[y \neq \hat{y} \right]$.

This is all that is given in the task. Does anyone have an idea how to approach the task?

ghxk
  • 65
  • 4

1 Answers1

2

A leaf has to opt for a single class as prediction for all data points, therefore the leaf should predict the class that occurs the most.

bonfab
  • 29
  • 1
  • 1
    This makes sense, but is a bit brief. Can you expand on this? How does it minimize OP's zero-one loss? – Sycorax Dec 09 '21 at 22:52
  • @bonfab: I can agree with the comment above, it makes sense but I would also be really interested in a expansion of the answer ;) – ghxk Dec 10 '21 at 07:13
  • 1
    Let $n$ be the number of data points at the leaf. If you choose class $c_i$ with prevalence $n_i$ then the loss would be $n-n_i$. However, if there is a class $c_j$ that is more occurring $n_j > n_i$, then choosing $c_j$ would reduce the loss more, so $n-n_j < n - n_i$. – bonfab Dec 10 '21 at 18:15
  • thanks @bonfab. It makes really sense now. Do you also know the answer if, under the same conditions as in the question, I had to find the **optimal prediction** in the leaf, for a regression tree, i.e. $y_{i} \in \mathbb{R}$, and squared percentage error loss $\mathcal{L}(y, \hat{y}) = \cfrac{\left(y - \hat{y} \right)^{2}}{y^2}$? – ghxk Dec 13 '21 at 06:50
  • It's good that you [posted a separate new question for the squared percentage loss](https://stats.stackexchange.com/q/555901/1352). That makes it easier for future generations to find it than if it is "hidden" in a comment on a question about a different loss. – Stephan Kolassa Dec 13 '21 at 08:58