2

My question is an extension to the question asked here. How does one identify the parity of predictor/feature/variable impact on response/outcome in a data mining model. Is there a standard procedure to find the 'direction' of impact after one does feature selection and derives variable importance using methods such as regularized random forest or lasso/elasticnet?

I know this question may sound quite naive, but I really wanted to know and I have searched SO and other materials but couldn't find a convincing answer.

KarthikS
  • 1,066
  • 1
  • 9
  • 18

1 Answers1

2
  • I don't thing there is a standard approach.
  • "Boosted Regression Trees for ecological modeling" is a commonly cited reference that briefly discusses some of these issues in the context of boosting. Partial dependence plots are available in many packages.
  • Rminer is a package the uses sensitivity analysis to extract information from models. Underused and has the benefit that you can use almost any model with this methodology.
  • Soren Welling is an active member on this site - has authored the [forestFloor][2] package and goes into some depth into getting information from blackbox models in the following stackexchange answer: Getting Information out of Blackbox Models - RandomForest / XGBoost
  • I haven't looked at the ggforest package, but this also offers to open the black box.
charles
  • 2,436
  • 11
  • 13