How to get top features that contribute to anomalies in Isolation forest

Question

I am using Isolation forest for anomaly detection on multidimensional data. The algorithm is detecting anomalous records with good accuracy. Apart from detecting anomalous records I also need to find out which features are contributing the most for a data point to be anomalous. Is there any way we can get this?

score 7 · Answer 1 · edited Jun 11 '20 at 12:06

SHAP values and the shap Python library can be used for this. Shap has built-in support for scikit-learn IsolationForest since October 2019.

import shap
from sklearn.ensemble import IsolationForest

# Load data and train Anomaly Detector as usual 
X_train, X_test, ...
est = IsolationForest()
est.fit(...)

# Create shap values and plot them
X_explain = X_test
shap_values = shap.TreeExplainer(est).shap_values(X_explain)
shap.summary_plot(shap_values, X_explain)

Here is an example of a plot I did for one IsolationForest model that I had, which was time-series.

You can also get partial dependence plots for a particular feature, or a plot showing the feature contributions for a single X instance. Examples for this is given in the shap project README.

score 0 · Answer 2 · answered Jun 21 '19 at 20:27

0

One possible describing feature importance in unsupervised outlier detecion is described in Contextual Outlier Interpretation. Similar as in the Lime approach, local linearity is assumed and by sampling a data points around the outlier of interest a classification problem is generated. The authors suggest to apply a SVM with linear kernel and use estimeited weights for feature importance.

answered Jun 21 '19 at 20:27

Simon Müller

101
1

I am wondering if you can use LIME to generate local explanations for each of the identified anomalies in IsolationForest? – FlyingPickle Oct 04 '20 at 14:37

How to get top features that contribute to anomalies in Isolation forest

2 Answers2

Linked