3

I was wondering if anyone knew how the feature interaction strength is calculated in the catboost package. The documentation https://catboost.ai/docs/concepts/output-data_feature-analysis_feature-interaction-strength.html#output-data_feature-analysis_feature-interaction-strength__per-feature-interaction-strength does not explain exactly how it is calculated.

1 Answers1

3

Feature interaction strength is stated here as: $\text{feature_strength} = \sum_{leaf s} | \sum_{f_1=f_2} \text{Leaf_Value} - \sum_{f_1!=f_2} \text{Leaf_Value} |$.

Decoding this: The value of the feature interaction strength for each pair of features $f_1$ and $f_2$ reflects the sum of absolute differences (after being summed across all trees) between the leaves of the tree containing the interaction $f_1:f_2$ and the leaves of the tree not containing the interaction of features $f_1:f_2$. Notice that this definition of feature interaction does not work if we have "stumps". At least one tree-learner of the ensemble is required to have at least two different features in its respective splits; if not, the term $\sum_{f_1=f_2} \text{Leaf_Value}$ will evaluate to NA.

The above mentioned methodology is a straightforward way of measuring interaction but unfortunately, none of the Catboost reference papers mentions anything formal on feature interactions (at the time of writing this answer). For a more authoritative/formal treatment of feature interactions in GBMs, I would strongly looking into the concept of $H$-statistic as proposed by Friedman and Popescu (2008) Predictive Learning via Rule Ensembles. This work takes the concept of partial dependency plots (from Friendman's earlier work (2001) Greedy function approximation: a gradient boosting machine) and generalises it. In short, it quantifies the change that the PDP of feature $x_j$ has when feature $x_i$ was included or not in the construction of $x_j$ PDP. Notice that the $H$-statistic becomes quickly quite expensive to evaluate as we really have to compute marginal distributions across numerous features-pairs. The Catboost approach should be extremely faster in comparison.

usεr11852
  • 33,608
  • 2
  • 75
  • 117
  • This was very helpful. Thank you. I was reading the documentation and it states that oblivious trees are used, meaning the same variable and split is used at every level of the tree. Therefore, I don't fully understand how we can look at the differences between the leaves within a tree, seeing as they would all have that interaction present. I have one other question. What exactly do you mean by "the differences between the leaves of a tree" ? – Niamh Belton Mar 10 '19 at 16:05
  • The difference between the values at the leaves of a tree. – usεr11852 Mar 10 '19 at 23:42
  • It’s still not clear to me. My interpretation of the formula would be, that it is the sum of absolute differences over all leafs with/without the interaction (globally, not per tree). Why do you think it is _within a tree_? – Peter Jul 04 '19 at 15:26
  • Apologies but as Catboost (*once more*) changed its documentation, so I cannot refer to you directly to a passage. That said, the formula shown is only applicable within a tree as it clearly does not contain any tree indices. In practice, Catboost, might made the summation (or a summation like procedure) across trees but that was not documented. – usεr11852 Jul 04 '19 at 16:27
  • Ok. Thanks, seams I need to follow the _UTSL_ paradigm. – Peter Jul 05 '19 at 06:55
  • I checked the source, the sum is over all leafs in all trees. There is just one single map build over pairs of features. ( score[(f1,f2)] -> interactionScore ) – Peter Jul 05 '19 at 07:21
  • Cool, thanks for that! I will update to the answer. The "current" formula in the docs clearly sums over trees. – usεr11852 Jul 05 '19 at 08:13
  • The relevant places are [here](https://github.com/catboost/catboost/blob/b0599f91db78ac2ecd25d05602e294fa128b74dd/catboost/libs/fstr/feature_str.cpp#L112) and [here](https://github.com/catboost/catboost/blob/b0599f91db78ac2ecd25d05602e294fa128b74dd/catboost/libs/fstr/feature_str.cpp#L137) – Peter Jul 05 '19 at 08:37