In caret, the calculation for results$RMSE
and results$Rsquared
is not as simple as what you've indicated. They are in fact the average of RMSE and $R^2$ over the ten holdout sets.
To confirm this, run the summary:
> t1
glmnet
1000 samples
20 predictors
No pre-processing
Resampling: Cross-Validated (10 fold)
Summary of sample sizes: 900, 900, 900, 900, 900, 900, ...
Resampling results across tuning parameters:
alpha lambda RMSE Rsquared
0.10 0.01065054 17.93931 0.1655746
0.10 0.10650539 17.93720 0.1656599
0.10 1.06505391 17.89291 0.1678166
0.55 0.01065054 17.93838 0.1657046
0.55 0.10650539 17.91755 0.1668356
0.55 1.06505391 17.84962 0.1731936
1.00 0.01065054 17.93824 0.1657245
1.00 0.10650539 17.90045 0.1678998
1.00 1.06505391 17.92535 0.1710923
RMSE was used to select the optimal model using the smallest value.
The final values used for the model were alpha = 0.55 and lambda = 1.065054.
For the optimal parameter combination alpha = 0.55 and lambda = 1.065054
the performance on each held-out set is seen in the object t1$resample
:
> t1$resample
RMSE Rsquared Resample
1 18.42848 0.04479504 Fold05
2 21.17820 0.10500276 Fold08
3 18.27933 0.20858027 Fold04
4 17.31308 0.19080079 Fold07
5 16.60865 0.21812706 Fold10
6 20.07291 0.18737052 Fold02
7 16.48082 0.24041654 Fold03
8 17.18363 0.18379930 Fold06
9 17.29819 0.13669866 Fold09
10 15.65289 0.21634546 Fold01
(Needless to say, the RMSE and Rsquared seen above are evaluated on different CV folds, so they don't rank order the same.) If you average these columns, you'll get:
> mean(t1$resample$RMSE)
[1] 17.84962
> mean(t1$resample$Rsquared)
[1] 0.1731936
...which are the same as the RMSE and Rsquared numbers seen in row 6 of the summary.
EDIT: Why does averaging over folds disrupt the rank ordering? Suppose we have split the data into $F$ folds, and we are considering $C$ tuning combinations. For each combo $c$ and held-out fold $f$, the relationship between the $R^2$ and MSE calculated on fold $f$ is:
$$\operatorname{Rsquared}(c,f)=1-\frac{\operatorname{MSE}(c,f)}{\operatorname{Var}(f)},\tag1
$$
where $\operatorname{Var}(f)$ is shorthand for the variance of the observed responses in fold $f$. It is certainly true that for a given $f$, if we average over all $c$ then the monotonic relationship between $R^2$ and MSE is preserved, since by linearity:
$$\frac1C\sum_c\operatorname{Rsquared}(c,f)=1-\frac{\frac1C\sum_c\operatorname{MSE}(c,f)}{\operatorname{Var}(f)}.\tag2
$$
However, if we average (1) over all $f$ we cannot assert a similar statement, since the denominator $\operatorname{Var}(f)$, which varies with the fold being held out, gets in the way:
$$\frac1F\sum_f\operatorname{Rsquared}(c,f)=1-\frac1F\sum_f\left(\frac{\operatorname{MSE}(c,f)}{\operatorname{Var}(f)}\right).\tag3
$$
The RHS of (3) cannot be simplified further to reveal a monotonic relationship between the average $R^2$ over all folds and the average MSE over all folds.
Since MSE is the square of RMSE, the relationship between fold-averaged $R^2$ and fold-averaged RMSE is even less direct. Indeed, for any given fold, there is not even an analog for (2) between combo-averaged $R^2$ and combo-averaged RMSE.