What causes the high OOB-error for randomForest() in R?

Question

I'm trying to perform a random forest in R on a dataset with 16364 observations (after undersampling), using the function randomForest(). But my results look really weird:

What could have caused this? My data was very unbalanced at first, why I used undersampling. Maybe I don't understand undersampling correctly. My impression when I read about it was that if the minority class contains n observations, then you could just randomly sample n observations from the majority class and then combine the observations of the minority class with this new subset of the majority class, to get a new balanced dataset.

Before I used undersampling, the OOB was very low but the minority class was very poorly classified.

I would be most grateful if someone could help me realise what's wrong here, thank you in advance!

it's very weird.. can you provide a bit more information about how this was fitted? Hard to tell what went wrong by just looking at a table — StupidWolf, May 11 '20 at 14:20
@StupidWolf Thank you for answering. My dataset has more than 100 000 observations and 13 predictors. I want to classify those observations in two categories Yes/No. But the Yes-class constitutes only 8% of data, so the result was a very skewed error rate. So I tried to do undersampling, and I'm starting to believe that is where the error lies. I believe that I somehow has the same little subset in every tree and that is probably why this weird result occur. But I don't know how I should implement it correctly. I've read about strata and samplesize, but I'm not sure I understand how it works. — AnnieFrannie, May 12 '20 at 07:11
Hi @AnnieFrannie, it could be. Do you mind sharing the code you used to perform the sampling? Hopefully it's not too long, keep it to just the sampling part. It's more useful than the results :) — StupidWolf, May 12 '20 at 13:46

What causes the high OOB-error for randomForest() in R?

0 Answers0