Choosing complexity parameter in CART

Question

In the rpart() routine to create CART models, you specify the complexity parameter to which you want to prune your tree. I have seen two different recommendations for choosing the complexity parameter:

Choose the complexity parameter associated with the minimum possible cross-validated error. This method is recommended by Quick-R and HSAUR.
Choose the greatest complexity parameter whose estimated cross-validated error is still within a SE of the minimum possible cross-validated error. This is my interpretation of the package documentation, which says: "A good choice of cp for pruning is often the leftmost value for which the mean lies below the horizontal line" in reference to this plot.

The two choices of cp produce quite different trees in my dataset.

It seems that the first method will always produce a more complex, potentially overfitted, tree. Are there other advantages, disadvantages, recommendations in the literature, etc. I should take into account when deciding which method to use? I can provide more information about my particular modelling problem if that would be useful, but am trying to keep this question broad enough to be relevant to others.

I believe it represents 1 SE above the minimum possible cross-validated error. — half-pass, Oct 09 '12 at 03:12
If you have enough data, you can try separating it into a training and test data set, even for trees. This may be especially useful if you are mainly interested in prediction, as the test data set will give a good estimate of that. Another choice is the `party` package which uses significance tests (not usually something I recommend, but it seems relevant here). As always, though, the best test is usefulness and sense; this is especially true if you are mainly interested in explanation. — Peter Flom, Oct 09 '12 at 10:49
My apologies for the slow response. To clarify, I thought that, by using cross-validation to compute the error at different tree sizes, I had already effectively split the data repeatedly into training and test sets. Would performing another training/test split be redundant in that case? Am I misunderstanding you? — half-pass, Oct 20 '12 at 15:10
Dividing the data in train/test and cross-validating the cp parameter using train data only, will reproduce a realistic prediction test (where you can't use future data to estimate cp). So the first CV will be for cp, the prediction error for the overall model (including the estimated cp). — Robert Kubrick, Jul 21 '15 at 13:43

score 6 · Accepted Answer · edited Sep 22 '17 at 15:30

In practice I have seen both approaches taken, and I think that generally your results would not be expected to differ much either way.

That being said, Hastie et al recommend the "one-standard error" rule in the Elements of Statistical Learning, and I tend to trust their judgment (Section 7.10, pg. 244 in my version). The relevant quote is:

Often a "one-standard error" rule is used with cross-validation, in which we choose the most parsimonious model whose error is no more than one standard error above the error of the best model."

Your intuition for why one would follow the one-standard error rule is right - you would do that to avoid selecting a model that overfits the data.

score 1 · Answer 2 · edited Oct 05 '15 at 12:43

1

You should first start by using the arguments minsplit=0 and cp=0 (complexity parameter) then use the functions plotcp(T.max) and printcp(T.max) choose the value of cp corresponding the minimum relative error and prune the tree by the function prune.rpart(T.max, cp=....)

This should get you the optimal classification tree as they tend to be over-optimistic.

edited Oct 05 '15 at 12:43

Dawny33

2,239
1
21
37

answered Oct 05 '15 at 11:51

Ayman Hijazy

19
1

Choosing complexity parameter in CART

2 Answers2

Linked