The randomForest implementation does not allow sampling beyond the number of observations, even when sampling with replacement. Why is this?
Works fine:
rf <- randomForest(Species ~ ., iris, sampsize=c(1, 1, 1), replace=TRUE)
rf <- randomForest(Species ~ ., iris, sampsize=3, replace=TRUE)
What I want to do:
rf <- randomForest(Species ~ ., iris, sampsize=c(51, 1, 1), replace=TRUE)
Error in randomForest.default(m, y, ...) :
sampsize can not be larger than class frequency
Similar error without stratified sample:
rf <- randomForest(Species ~ ., iris, sampsize=151, replace=TRUE)
Error in randomForest.default(m, y, ...) : sampsize too large
Since I was expecting the method to take bootstrap samples when given replace=TRUE in both cases, I was not expecting this limit.
My objective is to use this with the stratified sampling option, in order to draw a sufficiently large sample from a relatively rare class.