10

I am using RandomForest regressor on my data and I could see that the oob score was obtained to be 0.83. I am not sure how it came out to be like this. I mean my targets are high values in the range of 10^7. So if it's MSE then it should have been much higher. I don't understand what 0.83 signify here.

I am using python's RandomForestRegressor of the sklearn toolkit.

I do

model = RandomForestRegressor(max_depth=7, n_estimators=100, oob_score=True, n_jobs=-1) model.fit(trainX, trainY )

Then I see model.oob_score_ and I get values like 0.83809026152005295

user34790
  • 6,049
  • 6
  • 42
  • 64

1 Answers1

6

In order to compare the ground truth (i.e. correct/actual) target values with estimated (i.e. predicted) target values by the random forest , scikit-learn doesn't use the MSE but $R^2$ (unlike e.g. MATLAB or (Breiman 1996b)), as you can see in the code of forest.py:

self.oob_score_ = 0.0
for k in xrange(self.n_outputs_):
    self.oob_score_ += r2_score(y[:, k], predictions[:, k])
self.oob_score_ /= self.n_outputs_

r2_score() computes the coefficient of determination aka. R2, whose best possible score is 1.0, and lower values are worse.

FYI:

Franck Dernoncourt
  • 42,093
  • 30
  • 155
  • 271