I am following Hastie & Tibshiriani ISLR
In Chapter 8 they introduce Bagging, Random Forests and Boosting.
To compare each model they plot a curve of Test Error VS number of trees.
Various goodness of fit measures are mentioned: RSE/MSE (for regression), Gini index, cross-entropy and classification error rate (for classification).
They never explicitly reference Deviance - I looked here for a definition What is Deviance? (specifically in CART/rpart)
While helpful, it does not really tell me how it relates to the measures mentioned above. If I understand correctly: deviance is a measure of distance between two models.
- Deviance in regression equals RSS (why?). Particularly the distance between the candidate model and a perfect fit model, so the lower the better.
- And for classification, the sum of the Gini index or cross entropy error for all trees compared to a perfect fitted model gives the deviance of predicted vs observed (how?).
Can someone explain how all these relate to using "MSE" in the book, both in a regression and classification context? Why the ISLR book does not use deviance?