I posted this question on stack.exchange but was recommended to move here. I am developing model predictions using pls and h2o packages which lead to 2 models: pls.model and h2o.model. The R-square (square of pearson correlation) and RMSE for each round of cross-validation are shown below: R2:
i R2.PLS R2.H2O
1 1 0.4415108 0.6232292
2 2 0.3754088 0.6056992
3 3 0.4267580 0.6204750
4 4 0.3505282 0.6062691
5 5 0.2870766 0.5344183
6 6 0.3858786 0.5794828
7 7 0.3449946 0.5692314
8 8 0.2974582 0.5522208
9 9 0.3446449 0.5694339
10 10 0.3987684 0.5561757
RMSE:
i rmse.pls rmse.h2o
1 1 8.839967 40.99896
2 2 9.347349 29.94260
3 3 4.240366 14.75890
4 4 17.901563 29.89181
5 5 4.686803 66.04993
6 6 31.717909 10.28799
7 7 2.066342 32.74828
8 8 15.979214 21.05928
9 9 19.454079 10.88551
10 10 27.039400 68.27017
I am unable to explain why pls.model has lower R2 but lower error while h2o.model has higher R2 but high error. I checked the scatter plot but no non-linear pattern appear. Would you have any thought of this? And in this case, what should be the better model?
Thanks Phuong