R extraTrees Out-Of-Bag error estimate

Question

With the R package 'randomForest', I can call predict(myModel) to have the estimation of the error on the out-of-bag predictions. Alternatively, I can call myModel$mse.

I would like to do the same with extremely random trees, provided by the package 'extraTrees'. This did not work:

predict(erf)
Error in predict.extraTrees(erf) : 
  argument "newdata" is missing, with no default

And erf$... does not propose anything relevant. Any workaround ? Or other package...

...by default extraTrees has no bootstrapping and no out-of-bag samples. — Soren Havelund Welling, Apr 04 '16 at 15:47
@SorenHavelundWelling Thanks ! If write it in an answer, I can validate it ! — RUser4512, Apr 06 '16 at 13:38

score 3 · Accepted Answer · answered Apr 06 '16 at 19:09

So default extraTrees do not use bootstrap accordingly to the manual, see below. Because the splits are selected even more randomly than for regular RF, bootstrap is not needed as much to decorrelate trees.

How to implement it: I was personally interested in making extraTrees compatible with my own forestFloor package. So some time ago I read the Java source code as well as a none native Java programmer can. Inbag matrix needed to compute OOB error sampling is not exported or set as public. I wrote the author an email to make him export the information I needed as break points, inbag matrix etc. He was positive about doing that at some point, but I guess we both got away from it.

When inbag matrix is exported to the R side. It would only require a 20 line wrapper function around predict.extraTree to compute OOB-CV predictions.

Copied from R reference manual extraTrees 1.0.5

## Default S3 method:
extraTrees(x, y,
ntree=500,
mtry = if (!is.null(y) && !is.factor(y))
max(floor(ncol(x)/3), 1) else floor(sqrt(ncol(x))),
nodesize = if (!is.null(y) && !is.factor(y)) 5 else 1,
numRandomCuts = 1,
evenCuts = FALSE,
numThreads = 1,
quantile = F,
weights = NULL,
subsetSizes = NULL,
subsetGroups = NULL,
tasks = NULL,
probOfTaskCuts = mtry / ncol(x),
numRandomTaskCuts = 1,
na.action = "stop",
...)

subsetSizes subset size (one integer) or subset sizes (vector of integers, requires subsetGroups), if supplied every tree is built from a random subset of size subsetSizes. NULL means no subsetting, i.e. all samples are used.

subsetGroups list specifying subset group for each sample: from samples in group g, each tree will randomly select subsetSizes[g] samples.

Bonus info: extraTrees do not sample with replacements, see sourceCode extraTrees_1.0.5.tar.gz in file AbstractTrees.java code line 549-578. The algorithm shuffles the input samples for a tree and take the first from the vector. So unless also implemented user cannot have OOB sampling and sampling with replacement,

protected int[] getInitialSamples(Random random) {
        if (subsetSizes == null) {
            return seq( input.nrows() );
        }
        if (subsetSizes.length == 1) {
            ArrayList<Integer> allIds = arrayToList( seq( input.nrows() ) );
            ShuffledIterator<Integer> shuffle = new ShuffledIterator<Integer>(allIds, random);

            int[] subset = new int[ subsetSizes[0] ];
            for (int i=0; i < subset.length; i++) {
                subset[i] = shuffle.next();
            }
            return subset;
        }
        // selecting random samples from each subset:
        int[] subset = new int[ sum(subsetSizes) ];
        int i = 0;
        for (int b=0; b < subsetSizes.length; b++) {
            ArrayList<Integer> ids = arrayToList( subsetElems[b] );
            ShuffledIterator<Integer> shuffle = new ShuffledIterator<Integer>(ids, random);

            // filling with elements from subset[b]:
            for (int n = i + subsetSizes[b]; i < n; i++) {
                subset[i] = shuffle.next();
            }

        }

        return subset; 
    }

R extraTrees Out-Of-Bag error estimate

1 Answers1

Linked