4

I would like to know how we can retrieve (or compute) $R^2$ when the function cforest() from the package party is used. The function randomForest() from the package of the same name returns a coefficient of determination, while cforest() does not. I read here Manually calculated $R^2$ doesn't match up with randomForest() $R^2$ for testing new data that $R^2$ is computed using the following formula in the package randomForest():

R2<-1 - sum((y-predicted)^2)/sum((y-mean(y))^2) # y is the actual value

However, when applying this formula after having run a random forest on my data, I get a totally different result. When I used randomForest(), I got 83.33% of explained variation, whereas using the formula above after cforest() I got a bit more than 43%.

I wonder if the formula suggested above can be applied when cforest() is used and if there wouldn't be an easier way to retrieve a $R^2$ (or even an $R^2_{adj}$) in this particular case.

I thank you in advance for your help and your explanations.

CBechet.

EDIT: after much reading, I tried the following:

model<-cforest(y ~ a + b + c, data=mydata, controls=cforest_unbiased(ntree=2000, mtry=2))

oob.pred<-predict(model, type="response", OOB=TRUE)
residual<-y-oob.pred
mse<-sum(residual^2)/length(y)

pseudo.R2<-1-mse/var(y) # it yielded pseudo R^2 = 0.4327, so 43.27% of explained variance
n<-3
adj.R2<-1-(1-pseudo.R2)*((length(y)-1)/(length(y)-n-1)) # where n = number of predictors in the model; yielded 0.394, so 39.4% of explained variance

When I use the function randomForest, I get this:

model<-randomForest(y ~ a + b + c, data=mydata, ntree=2000, mtry=2, importance=TRUE)
print(model) # % of explained variation = 83.12%

Call:
 randomForest(formula = y ~ a + b + c, data = mydata, ntree = 2000,      mtry = 2, importance = TRUE) 
               Type of random forest: regression
                     Number of trees: 2000
No. of variables tried at each split: 2

          Mean of squared residuals: 0.0110483
                % Var explained: 83.33

What am I doing wrong when using cforest?


If I provide a reproducible example, I get the following results:

#### Minimal reproducible example ####

### Loading the dataset 'airquality' ###

> data("airquality")

### Creating a first random forest, using randomForest() from the package 'randomForest' ###

> set.seed(131) # to get the same result each time the random forest is created

> ozone.rf.1 <- randomForest(Ozone ~ ., data=airquality, mtry=3, importance=TRUE, na.action=na.omit) # running the model on the complete cases only
> print(ozone.rf.1)

Call:
 randomForest(formula = Ozone ~ ., data = airquality, mtry = 3,      importance = TRUE, na.action = na.omit) 
               Type of random forest: regression
                     Number of trees: 500
No. of variables tried at each split: 3

          Mean of squared residuals: 303.8304
                    % Var explained: 72.31

## MSE = 303.8304
## pseudo-R^2 = 0.7231

### Creating a second random forest, using cforest() from the package 'party' ###

> set.seed(131) # to get the same result each time the random forest is created
> ozone.rf.2 <- cforest(Ozone ~ ., data=na.omit(airquality), controls=cforest_unbiased(mtry=3)) # running the model on the complete cases only
> oob.pred<-predict(ozone.rf.2, type="response", OOB=TRUE)
> airquality.2<-na.omit(airquality)
> residual<-airquality.2$Ozone-oob.pred
> mse<-sum(residual^2)/length(airquality.2$Ozone)
> pseudo.R2<-1-mse/var(airquality.2$Ozone)
> adj.R2<-1-(1-pseudo.R2)*((length(airquality.2$Ozone)-1)/(length(airquality.2$Ozone)-5-1)) # 0.7042, so 70.42% of explained variance

## MSE = 312.672
## pseudo-R^2 = 0.7176

I think that the difference between the two $R^2$ here is due to the differences in the computation algorithm of cforest.

However, when I use the same procedure for my own dataset (which I can't use as a reproducible example here for lack of space), I get a huge difference between the two $R^2$ (see my question above). I hope someone can help me out of this.

I thank you all in advance!

gung - Reinstate Monica
  • 132,789
  • 81
  • 357
  • 650
CBechet
  • 81
  • 5

0 Answers0