On outcome stratified random forest, problems?

Asked May 31 '18 at 08:11

Active May 31 '18 at 10:37

Viewed 83 times

Hello Crossvalidated!

I have a question that I can't figure out.

I am working on building a classifier for a dichotomous outcome (0, 1). I use R for this. I used a Random Forest algorithm from the randomForest package.

The outcome variable is very skewed with the "1" outcome only being 7% of the cases, with n = 12000. The results of the analysis resulted in a misclassification rate for the "1" of 97% and for the "0" below 0.1%

Since random forest is essentially a from of bootstrapping, I tried stratifying on the outcome to get better classifications. The misclassification rates changed to "1" 60% and "0" 20%.

My gut feeling tells me that stratifying on outcome is not good practice, but I can't seem to find anything specifically on the subject.

The answer below relates to this, but the situation is different. Most interesting statistical paradoxes

In the answer above a "reverse" Simpsons paradox is explained, but this is when one unknowingly stratifies on outcome.

So the question is: Are there any negative effects from stratifying on outcome when using a bootstrap/random forest?

edited May 31 '18 at 10:37

asked May 31 '18 at 08:11

SK4ndal

On outcome stratified random forest, problems?

0 Answers0