6

From the training output below, it looks like R squared is not calculated by the traditional formula 1 - SSE/SST, since lower error rate has lower R squared. So how is it calculated?

  colsample_bytree  min_child_weight  RMSE      Rsquared 
  0.4               3                 16963.39  0.8799191
  0.4               5                 16813.24  0.8788395
user1569341
  • 253
  • 3
  • 5

2 Answers2

7

The code is here:

> R2
function(pred, obs, formula = "corr", na.rm = FALSE) {
    n <- sum(complete.cases(pred))
    switch(formula,
           corr = cor(obs, pred, use = ifelse(na.rm, "complete.obs", "everything"))^2,
           traditional = 1 - (sum((obs-pred)^2, na.rm = na.rm)/((n-1)*var(obs, na.rm = na.rm))))
  }

It follows the idea of calculating R then squaring it. There are a lot of formulas for R^2 that can be used. See Kvalseth. Cautionary note about R^2. American Statistician (1985) vol. 39 (4) pp. 279-285. All of this is described at ?R2.

topepo
  • 5,820
  • 1
  • 19
  • 24
2

Update: The R2 function is deprecated. You can get R2 by the postResample function now.

The code is here and shows in line 136 and 142 the formula:

resamplCor <- try(cor(pred, obs, use = "pairwise.complete.obs"), silent = TRUE
...
out <- c(sqrt(mse), resamplCor^2, mae)

So postResample uses the squared correlation to calculate R2.

Agile Bean
  • 131
  • 5