So default extraTrees do not use bootstrap accordingly to the manual, see below. Because the splits are selected even more randomly than for regular RF, bootstrap is not needed as much to decorrelate trees.
How to implement it:
I was personally interested in making extraTrees compatible with my own forestFloor
package. So some time ago I read the Java source code as well as a none native Java programmer can. Inbag matrix needed to compute OOB
error sampling is not exported or set as public. I wrote the author an email to make him export the information I needed as break points, inbag matrix etc. He was positive about doing that at some point, but I guess we both got away from it.
When inbag matrix
is exported to the R side. It would only require a 20 line wrapper function around predict.extraTree to compute OOB-CV predictions.
Copied from R reference manual extraTrees 1.0.5
## Default S3 method:
extraTrees(x, y,
ntree=500,
mtry = if (!is.null(y) && !is.factor(y))
max(floor(ncol(x)/3), 1) else floor(sqrt(ncol(x))),
nodesize = if (!is.null(y) && !is.factor(y)) 5 else 1,
numRandomCuts = 1,
evenCuts = FALSE,
numThreads = 1,
quantile = F,
weights = NULL,
subsetSizes = NULL,
subsetGroups = NULL,
tasks = NULL,
probOfTaskCuts = mtry / ncol(x),
numRandomTaskCuts = 1,
na.action = "stop",
...)
subsetSizes subset size (one integer) or subset sizes (vector of integers, requires subsetGroups), if supplied every tree is built from
a random subset of size subsetSizes. NULL means no subsetting, i.e.
all samples are used.
subsetGroups list specifying subset group for each sample: from samples in group g, each tree will randomly select subsetSizes[g]
samples.
Bonus info:
extraTrees do not sample with replacements, see sourceCode extraTrees_1.0.5.tar.gz in file AbstractTrees.java code line 549-578. The algorithm shuffles the input samples for a tree and take the first from the vector. So unless also implemented user cannot have OOB sampling and sampling with replacement,
protected int[] getInitialSamples(Random random) {
if (subsetSizes == null) {
return seq( input.nrows() );
}
if (subsetSizes.length == 1) {
ArrayList<Integer> allIds = arrayToList( seq( input.nrows() ) );
ShuffledIterator<Integer> shuffle = new ShuffledIterator<Integer>(allIds, random);
int[] subset = new int[ subsetSizes[0] ];
for (int i=0; i < subset.length; i++) {
subset[i] = shuffle.next();
}
return subset;
}
// selecting random samples from each subset:
int[] subset = new int[ sum(subsetSizes) ];
int i = 0;
for (int b=0; b < subsetSizes.length; b++) {
ArrayList<Integer> ids = arrayToList( subsetElems[b] );
ShuffledIterator<Integer> shuffle = new ShuffledIterator<Integer>(ids, random);
// filling with elements from subset[b]:
for (int n = i + subsetSizes[b]; i < n; i++) {
subset[i] = shuffle.next();
}
}
return subset;
}