R-squared vs MSE, why the discrepancy?

Question

I am carrying out a project where I am imputing missing data. I am trying to compare an imputed dataset with a baseline dataset by measuring MSE and R-squared. These metrics are measured by performing linear regression and carrying out 10-fold cross validation.

The problem is that for the baseline dataset I get MSE = 85.0 and R-squared = 45.5; and for the imputed dataset I get MSE = 97.6 and R-squared = 47.2.

So we see that the MSE is lower (better) for the baseline but the R-squared is higher (better) for the imputed dataset.

I am trying to see if the imputed dataset is a better choice than the baseline, but I am now confused as to which metric to choose.

Both datasets have the same features (16) and target feature. The features include both categorical and continuous features. The target is a continuous feature. The baseline has 814 observations; the imputed set has 879. They have not been scaled or normalized.

Please could you advise on why there is no clear "winner"? Which metric should I choose? Etc.

Thank you very much.

https://stats.stackexchange.com/questions/100281/selecting-the-best-model-using-cross-validation-on-coefficient-of-determination?rq=1 — Krantz, Apr 07 '19 at 17:28
Does this answer your question? [Selecting the best model using cross-validation on coefficient of determination and/or mean squared error](https://stats.stackexchange.com/questions/100281/selecting-the-best-model-using-cross-validation-on-coefficient-of-determination) — DannyDannyDanny, Jul 20 '20 at 14:31
It's **not** a duplicate: the issue here is models fitted before vs after imputation. — Thomas Lumley, Jul 21 '20 at 04:26

R-squared vs MSE, why the discrepancy?

0 Answers0