14

A colleague in my office said to me today "Tree models aren't good because they get caught by extreme observations".

A search here resulted in this thread that basically supports the claim.

Which leads me to the question - under what situation can a CART model be robust, and how is that shown?

Tal Galili
  • 19,935
  • 32
  • 133
  • 195

3 Answers3

15

No, not in their present forms. The problem is that convex loss functions cannot be made to be robust to contamination by outliers (this is a well known fact since the 70's but keeps being rediscovered periodically, see for instance this paper for one recent such re-discovery):

http://www.cs.columbia.edu/~rocco/Public/mlj9.pdf

Now, in the case of regression trees, the fact that CART uses marginals (or alternatively univariate projections) can be used: one can think of a version of CART where the s.d. criterion is replaced by a more robust counterpart (MAD or better yet, Qn estimator).

Edit:

I recently came across an older paper implementing the approach suggested above (using robust M estimator of scale instead of the MAD). This will impart robustness to "y" outliers to CART/RF's (but not to outliers located on the design space, which will affect the estimates of the model's hyper-parameters) See:

Galimberti, G., Pillati, M., & Soffritti, G. (2007). Robust regression trees based on M-estimators. Statistica, LXVII, 173–190.

user603
  • 21,225
  • 3
  • 71
  • 135
  • Thank you kwak. This article seems to be talking about boosting methods. Do the results they present hold for the simple classifier case of a CART model? (on the surface it sounds like it, but I didn't go through the article enough to really know) – Tal Galili Sep 06 '10 at 15:28
  • The result they present holds for any convex loss function, and was initially discussed by Tukey. To sum things up, the measure of spread (Gini or entropy) used to quantify the quality of a node is sensitive to contamination by outliers (i.e. observations that are miss-labeled in the dataset). This problem affects both the building and the prunning stage. Contamination of a dataset by observation with wrongly imputed label will typically cause the resulting tree to be much too complex (you can check this rather easely by yourself). – user603 Sep 06 '10 at 15:38
  • Thank you Kwak! And is there no loss function that is robust? – Tal Galili Sep 06 '10 at 16:11
  • 1
    no *convex* loss function. See this article "A fast algorithm for the minimum covariance determinant estimator" for an example of what can be done with non-convex loss functions (although not related to classification, the article is worth a read). – user603 Sep 06 '10 at 16:14
  • It is very enjoyable to ask questions when getting answers like the ones you give - thank you! Last questions: Do you know of an easy way to implement such a loss function in R's CART? (rpart) – Tal Galili Sep 06 '10 at 16:29
  • The problem you point out seems easier to solve for regression than classification trees. I do not know how to change the spread criterion (from s.d. to Qn) in R but since Qn (and MAD) are implemented it should not be difficult. Best (let us know whether you had any success with a 'robustified' version). – user603 Sep 06 '10 at 16:40
  • 2
    @Tal CART is an equivalent to boosting, of a "pivot classifier" (the criterion that sits in each tree node, like some attribute grater than something or some attribute value in set something). –  Sep 06 '10 at 17:47
  • True. Since I was intending to try this on regression trees, I'll have a go at it in the next few weeks and will come back to report if something nice will be formalized. Best. – Tal Galili Sep 06 '10 at 17:49
6

You might consider using Breiman's bagging or random forests. One good reference is Breiman "Bagging Predictors" (1996). Also summarized in Clifton Sutton's "Classification and Regression Trees, Bagging, and Boosting" in the Handbook of Statistics.

You can also see Andy Liaw and Matthew Wiener R News discussion of the randomForest package.

Shane
  • 11,961
  • 17
  • 71
  • 89
  • 2
    Not to spoil the party, but how are random forest supposed to provided robustness to contamination by outliers is a mystery. – user603 Sep 06 '10 at 15:27
  • 3
    @kwak Still, this is a good answer; trees in RF don't see the entire set, so many of them will not be contaminated. Even better -- tracking in which leafs do OOB cases land can be used to find mislabeled objects and eliminate them. (As I recall now, this is mentioned in Breiman's paper about RF). –  Sep 06 '10 at 17:50
  • 4
    The problem is that outliers will make some 'bad' (i.e. contaminated) tree look better than good (uncontaminated) ones. This is called, masking effect and is easy to replicate with simulated data. The problem comes about because the criterion you use to evaluate trees is not in itself robust to outliers. I know i'm starting to sound like a fundamentalist mullah, but unless each and every tool you use is made robust, your procedure can be shown to be sensitive (at one level or another) to outliers (and hence not robust). – user603 Sep 07 '10 at 22:20
3

If you check out the 'gbm' package in R (generalized gradient boosting), the 'boosting' uses loss functions that are not necessarily mean squared error. This shows up in the 'distribution' argument to function 'gbm()'. Thus the elaboration of the tree via boosting will be resistant to outliers, similar to how M-estimators work.

You might start here.

Another approach would be to build the tree the usual way (partitions based on SSE), but prune the tree using cross validation with a robust measure of fit. I think xpred in rpart will give cross validated predictors (for a variety of different tree complexities), which you can then apply your own measure of error, such as mean absolute value.

AlaskaRon
  • 2,219
  • 8
  • 12