8

I checked both the randomForest and the rfsrc packages in R, but couldn't find an easy way to apply observation/case weight when training the random forest model. Is there any way to do this?

As an alternative I thought about replicating my observations (e.g. replicate once if the observation has a weight of 2), but think this would be inefficient and difficult for non-integer case weight.

xiaoxiao87
  • 421
  • 4
  • 7

2 Answers2

5

Do not duplicate to up-weight samples. That would make the out-of-bag cross validation very over optimistic.

Both stratification and class weighting are implemented in randomForest and here's some other threads on that.

random-forest-with-classes-that-are-very-unbalanced

R package for Weighted Random Forest? classwt option?

Weighting more recent data in Random Forest model

  • 2
    I think the OP was asking about weighting cases and not weighting cases only with regards to the class. In the former scenario instances from the same class can have different weights. Yes you are right: if you over-sample you have to be careful about your out-of-bag error. However, if you are careful enough to re-sample just the training set you can rely on the test set error. – Simone Aug 11 '15 at 00:41
  • 1
1

Replicating your observations might be a good idea. I know that WEKA allows different weights for each instance.

From WEKA's wiki:

This feature exists in versions of Weka >= 3.5.8.

A weight can be associated with an instance in a standard ARFF file by appending it to the end of the line for that instance and enclosing the value in curly braces. E.g:

@data
0, X, 0, Y, "class A", {5}

For a sparse instance, this example would look like:

@data
{1 X, 3 Y, 4 "class A"}, {5}

If you still want to use R you might try the package RWeka.

Simone
  • 6,513
  • 2
  • 26
  • 52
  • maybe you meant "might not be good idea". Here's an example : http://stats.stackexchange.com/questions/164111/why-bootstrap-result-in-overfitting-for-randomforest-prediction#comment312194_164111 – Soren Havelund Welling Aug 10 '15 at 14:33
  • Actually I really meant it might be a good idea. WEKA weighted instances simulate oversampling of particular instances and I think a group of replicated instances will always follow the same path on a decision tree. – Simone Aug 11 '15 at 00:37
  • sorry then :) and then also same path through forest, where a group is either only inbag or only outbag for a given tree? – Soren Havelund Welling Aug 11 '15 at 00:45
  • 1
    Yeah, with forests as you say it's true we might have some problems, some instances can be in-bag and some out-of-bag. However, I guess if you don't rely on the out-of-bag error too much it should be fine. Anyway, I would prefer to work with off-the-shelf techniques that allow weighting instances rather than resample them :) – Simone Aug 11 '15 at 00:49