I see two questions here.
1) What is the difference between weights
and parms
in rpart
?
If you look at the code, weights
argument is passed to the model.frame
object, so it should be applied towards each observation of your dataset, just like in lm
.
if (is.data.frame(model)) {
m <- model ## <---- m is defined here
model <- FALSE
}
else {
indx <- match(c("formula", "data", "weights", "subset"),
names(Call), nomatch = 0L)
if (indx[1] == 0L)
stop("a 'formula' argument is required")
temp <- Call[c(1L, indx)]
temp$na.action <- na.action
temp[[1L]] <- quote(stats::model.frame) ## <---- passed to model.frame
m <- eval.parent(temp)
}
Terms <- attr(m, "terms")
if (any(attr(Terms, "order") > 1L))
stop("Trees cannot handle interaction terms")
Y <- model.response(m)
wt <- model.weights(m) ## <---- used as observation weights
On the other hand, parms
is for the class weights, which deals with unbalanced class size. I believe this is what you are looking for.
2) How to use the parms
argument?
If you look at the description of parms
:
For classification splitting, the list can contain any of: the vector of prior probabilities (component prior), ...
Hence, you want to store your prior probability vector in a list with name "prior". The order of probability should be exactly the same as the output of levels(data$y)
, where y
indicates your response variable. For example, you might want to try something like the following:
fit <- rpart(y ~ x1 + x2 + x3, data = data, parms = list(prior = c(0.000066, 1 - 0.000066)))