Checking noisy data from random forest

Asked Mar 31 '16 at 02:29

Active Mar 31 '16 at 02:29

Viewed 450 times

I am running a randomforest on some unbalanced data with five classes (different behaviours). I have downsampled my data to address the balance issue but I believe there is a substantial amount of noise in one of my classes as it is often misclassified. I would like to go back and see which observations are being predicted wrongly to check that the original data is not mislabelled. Is there a way to do this?

asked Mar 31 '16 at 02:29

Laura

try down sampling by stratification and grow plenty of trees. For very noisy data, to grow more trees on smaller bootstrap samples actually give better prediction performance. As a start simply look to the out-of-bag predictions and the out-of-bag CV error, this answer should explain the most of it: http://stats.stackexchange.com/questions/157714/r-package-for-weighted-random-forest-classwt-option/158030#158030 – Soren Havelund Welling Mar 31 '16 at 13:27

Checking noisy data from random forest

0 Answers0