In a random forest, is larger %IncMSE better or worse?

Question

Once I have built a (regression) random forest model in R, the call rf$importance provides me with two measures for each predictor variable, %IncMSE and IncNodePurity. Is the interpretation that predictor variables with smaller %IncMSE values more important than predictor variables with bigger %IncMSE values?

How about for IncNodePurity?

score 35 · Accepted Answer · edited Apr 13 '17 at 12:44

%IncMSE is the most robust and informative measure. It is the increase in mse of predictions(estimated with out-of-bag-CV) as a result of variable j being permuted(values randomly shuffled).

grow regression forest. Compute OOB-mse, name this mse0.
for 1 to j var: permute values of column j, then predict and compute OOB-mse(j)
%IncMSE of j'th is (mse(j)-mse0)/mse0 * 100%

the higher number, the more important

IncNodePurity relates to the loss function which by best splits are chosen. The loss function is mse for regression and gini-impurity for classification. More useful variables achieve higher increases in node purities, that is to find a split which has a high inter node 'variance' and a small intra node 'variance'. IncNodePurity is biased and should only be used if the extra computation time of calculating %IncMSE is unacceptable. Since it only takes ~5-25% extra time to calculate %IncMSE, this would almost never happen.

A similar question and answer

In a random forest, is larger %IncMSE better or worse?

1 Answers1

Linked