`rpart` is an R package that provides a number of routines related to regression trees and recursive partitioning algorithms. This package is frequently used for classification problems.
Questions tagged [rpart]
136 questions
52
votes
3 answers
What is Deviance? (specifically in CART/rpart)
What is "Deviance," how is it calculated, and what are its uses in different fields in statistics?
In particular, I'm personally interested in its uses in CART (and its implementation in rpart in R).
I'm asking this since the wiki-article seems…

Tal Galili
- 19,935
- 32
- 133
- 195
30
votes
4 answers
How to measure/rank "variable importance" when using CART? (specifically using {rpart} from R)
When building a CART model (specifically classification tree) using rpart (in R), it is often interesting to know what is the importance of the various variables introduced to the model.
Thus, my question is: What common measures exists for…

Tal Galili
- 19,935
- 32
- 133
- 195
19
votes
3 answers
Regression tree algorithm with linear regression models in each leaf
Short version: I'm looking for an R package that can build decision trees whereas each leaf in the decision tree is a full Linear Regression model. AFAIK, the library rpart creates decision trees where the dependent variable is constant in each…

cheesus
- 521
- 1
- 3
- 10
16
votes
2 answers
Choosing complexity parameter in CART
In the rpart() routine to create CART models, you specify the complexity parameter to which you want to prune your tree. I have seen two different recommendations for choosing the complexity parameter:
Choose the complexity parameter associated…

half-pass
- 3,594
- 7
- 23
- 34
15
votes
2 answers
Partitioning trees in R: party vs. rpart
It's been a while since I looked at partitioning trees. Last time I did this sort of thing, I like party in R (created by Hothorn). The idea of conditional inference via sampling makes sense to me. But rpart also had appeal.
In the current…

Peter Flom
- 94,055
- 35
- 143
- 276
12
votes
1 answer
Difference in implementation of binary splits in decision trees
I am curious about the practical implementation of a binary split in a decision tree - as it relates to levels of a categorical predictor $X{j}$.
Specifically, I often will utilize some sort of sampling scheme (e.g. bagging, oversampling etc) when…

B_Miner
- 7,560
- 20
- 81
- 144
11
votes
2 answers
Organizing a classification tree (in rpart) into a set of rules?
Is there a way that once a complex classification tree is constructed using rpart (in R), to organize the decision rules produced for each class? So instead of getting one huge tree, we get a set of rules for each of the classes?
(if so, how?)
Here…

Tal Galili
- 19,935
- 32
- 133
- 195
11
votes
4 answers
rpart complexity parameter confusion
I'm a little bit confused on the calculation for CP in the summary of an rpart object.
Take this example
df <- data.frame(x=c(1, 2, 3, 3, 3),
y=factor(c("a", "a", "b", "a", "b")),
method="class")
mytree<-rpart(y ~…

Ben
- 1,612
- 3
- 17
- 30
10
votes
1 answer
How to choose the number of splits in rpart()?
I have used rpart.control for minsplit=2, and got the following results from rpart() function. In order to avoid overfitting the data, do I need to use splits 3 or splits 7? Shouldn't I use splits 7? Please let me know.
Variables actually used in…

samarasa
- 1,287
- 6
- 18
- 26
10
votes
2 answers
How to evaluate the goodness of fit for survial functions
I am a newcomer to survival analysis, although I have some knowledge in classification and regression.
For regression, we have MSE and R square statistics. But how we can say that survival model A is superior to survival model B besides some kind…

floodking
- 323
- 2
- 7
9
votes
1 answer
Difference between weights and prior in rpart and how to use them
I have a question about the "weights" and "prior" in R's rpart function.
This question has been asked before here, but the answer doesn't quite make sense.
Currently I have very unbalanced data where the target is only 0.0066% of the whole dataset,…

Jason
- 91
- 1
- 2
- 5
9
votes
2 answers
How are CP (Cost Complexity) values calculated in RPART (or decision trees in general)
From what I understand, the cp argument to the rpart function helps pre-prune the tree in the same way as the minsplit or minbucket arguments. What I don't understand is how CP values are computed. For example
df<-data.frame(x=c(1,2,3,3,3,4),…

Ben
- 1,612
- 3
- 17
- 30
8
votes
1 answer
R-square from rpart model
How can I extract the R-square from a fit rpart model?
rsq.rpart(fit)
plots the two graphs, but I simply want to extract the R-square value for the full tree.
I assume this is fairly obvious, but numerous searches didn't really lend anything…

Btibert3
- 1,154
- 1
- 13
- 23
8
votes
1 answer
how does rpart handle missing values in predictors?
From the ?rpart documentation -
na.action : the default action deletes all observations for which y is missing, but keeps those in which one or more predictors are
missing.
How does it impute missing values in predictors?

Nishanth
- 266
- 1
- 2
- 7
8
votes
1 answer
rpart and the printcp function
I don't really understand how the columns "xerror" and "rel error" are calculated.
I found out that the printcp() function "gives cross-validation estimates of misclassication error (xerror), standard errors (xstd) of those estimates and the…

Giuseppe
- 1,211
- 3
- 14
- 23