2

I have used a random forest model for regression analysis. Now, I am having difficulty in working out what can be used in measuring variable importance.

The importance function provides the mean decrease in accuracy and the mean decrease in MSE.

the question is:

why the absolute value of importance is not very helpful and how to build relative importance values.

user71837
  • 21
  • 2
  • relative importance = scaled importance. usually scaled such that most importance variable =100 – charles Mar 24 '15 at 07:31

1 Answers1

1

Generally IncNodePurity is taken as main indicator of importance. From ?importance:

> set.seed(4543)
>      data(mtcars)
>      mtcars.rf <- randomForest(mpg ~ ., data=mtcars, ntree=1000,
                              keep.forest=FALSE, importance=TRUE)
> 
> importance(mtcars.rf)
       %IncMSE IncNodePurity
cyl  16.168645     169.96741
disp 18.672188     260.08722
hp   17.584375     184.95007
drat  6.948743      63.54528
wt   17.818509     254.30347
qsec  4.772889      33.25546
vs    5.303058      24.39064
am    5.210181      17.36626
gear  4.619161      21.55450
carb  8.577037      28.46715

enter image description here

Relative importance can be relative to minimum value of IncNodePurity:

      X.IncMSE IncNodePurity relativeIncNodePurity
am    5.210181      17.36626            1.000000
gear  4.619161      21.55450            1.241171
vs    5.303058      24.39064            1.404484
carb  8.577037      28.46715            1.639221
qsec  4.772889      33.25546            1.914947
drat  6.948743      63.54528            3.659123
cyl  16.168645     169.96741            9.787219
hp   17.584375     184.95007           10.649965
wt   17.818509     254.30347           14.643536
disp 18.672188     260.08722           14.976581
rnso
  • 8,893
  • 14
  • 50
  • 94