3

I am following Hastie & Tibshiriani ISLR

In Chapter 8 they introduce Bagging, Random Forests and Boosting.

To compare each model they plot a curve of Test Error VS number of trees.

Various goodness of fit measures are mentioned: RSE/MSE (for regression), Gini index, cross-entropy and classification error rate (for classification).

They never explicitly reference Deviance - I looked here for a definition What is Deviance? (specifically in CART/rpart)

While helpful, it does not really tell me how it relates to the measures mentioned above. If I understand correctly: deviance is a measure of distance between two models.

  • Deviance in regression equals RSS (why?). Particularly the distance between the candidate model and a perfect fit model, so the lower the better.
  • And for classification, the sum of the Gini index or cross entropy error for all trees compared to a perfect fitted model gives the deviance of predicted vs observed (how?).

Can someone explain how all these relate to using "MSE" in the book, both in a regression and classification context? Why the ISLR book does not use deviance?

P T
  • 31
  • 2

0 Answers0