$R^2$ (OOB) worse than $R^2$ (test)

Question

I have fitted a regression random forest model using a data set split (80/20) for training and testing. The resulting model gives me an $R^2$ (OOB) of 0.21, and the $R^2$ computed on the testing is 0.82! That is a huge difference and I am unsure whether my model explains 21% or 82% of the variability in the response. Should I believe the OOB $R^2$?

[Note that in a nonlinear model like a random forest, $R^2$ lacks its usual “proportion of variance explained” interpretation.](https://stats.stackexchange.com/q/551915/247274) — Dave, Dec 06 '21 at 23:55
Aside @Dave's reasonable comment ( +1) can you please report MSE and MAE for testing and OOB? — usεr11852, Dec 16 '21 at 21:40

score 0 · Answer 1 · edited Dec 21 '21 at 14:14

0

Besides r2 not being mathematically valid for non-linear regressions, it is still widely used for model validation.

R2 is a funny thing and only looking at r2 might lead you to wrong conclusions (even for linear models). The problem with r2 is that you divide the sum of squared residuals by the sum of squared error to the mean.

This means if your target is not so wide spread (e.g.: assume true y is only between 0 and .2) then a straight line with the mean value of y could be better than your predictions using some sophisticated machine learning models (at least in theory).

But this also could mean that small changes in your predictions can have big a big impact on your r2.

I would suggest the following:

Add at least one more evaluation metric: Mean absolute error is good, because you can have a good sense of its meaning. Also try to get root mean squared error (any values below 1 are great)
Plot y against your predictions (if possible for your OOB sample, too. --> Not so sure this is available in sklearn)
Check that your data got sampled (I guess you allowed random sampling in train-test, but just to be sure I am mentioning it)
Try to change some hyperparameters of your random forest and see if you get the same difference --> reduce max_depth for example

I am pretty sure that this looks like some kind of bug and you should be able to solve this. If the data gets split randomly it should not be possible to see such a big difference, although the only reason could be a very small sample size, where some outliers have a big impact on your performance.

edited Dec 21 '21 at 14:14

Nick Cox

48,377
8
110
156

answered Dec 16 '21 at 20:52

janrth

69
5

2

[$R^2$ is funky in the nonlinear case](https://stats.stackexchange.com/questions/551915/interpreting-nonlinear-regression-r2), but "not...mathematically valid for non-linear regressions" takes it too far. $R^2$ is just as valid as $MSE$. – Dave Dec 16 '21 at 21:06
But isn't it the case that the sum of residuals for a non-linear situation do not behave to the mean prediction as it should be in a linear case, which makes the interpretation basically non-valid. This is what I mean by mathematically invalid, because the metric does not return what you want to see. I would for example not pick the best model based on r2 for a tree based algorithm. But anyway....:) It is still used everywhere:) – janrth Dec 17 '21 at 11:44
How would you pick the best tree-based regression model, mean squared error or RMSE? – Dave Dec 17 '21 at 11:54
good question. Probably (r)mse. But I like to also look at mae, because it is very straight forward in the interpretation. – janrth Dec 18 '21 at 11:05
1

Minimizing (r)mse is equivalent to maximizing the “invalid” $R^2$. – Dave Dec 18 '21 at 12:50
There is this paper everybody is referring to. They did tons of simulations and found how r2 is invalid in picking the best model. Mse and mae should always work no matter how non-linear your data is. Then only the question of outliers would decide if you go with mse or mae. here is the paper: Spiess, Andrej-Nikolai, Natalie Neumeyer. An evaluation of R2 as an inadequate measure for nonlinear models in pharmacological and biochemical research: a Monte Carlo approach. BMC Pharmacology. 2010; 10: 6. – janrth Dec 18 '21 at 21:11
1

$R^2$ is a monotonic function of MSE. $$R^2=1-\dfrac{nMSE}{\sum_i(y_i-\bar y)^2}$$ – Dave Dec 18 '21 at 22:53
The scaler only denominator has specific properties for purely linear functions. For example it will be between 0 and 1 for linear functions. But it can be different for non-linear relations. To be honest I have never went deep into the theoretical explanation. But it is what you find pretty often when people talk about r2 and also how I understand r2. Try to compare a linear function against a concave function. I think what will happen is that the denominator will behave differently. Play around with different prediction values and try to compare r2 against the mse for both functions. – janrth Dec 19 '21 at 10:05
And then I think it should be obvious how the relation between the r2 and the mse for the two cases will be different. Maybe I want to try this also now:) I want to see what happens. And again I am not a Mathematician. My understanding of that problem is merely intuitively at the moment. By the way I liked your decomposition of the r2 in your other post:) – janrth Dec 19 '21 at 10:09
1

The denominator is a property of the data. Whether you model with a linear regression, random forest, neural network, or how many times your dog barks when you tell her the values of the features, the denominator is the same. – Dave Dec 19 '21 at 21:02
Hey Dave, I created a simple notebook. I hope I did not made any mistakes. Please review, if you feel something is fishy. What you will see is a linear and a non-linear case. In both cases I create the predictions from y+gaussian(mean, std.). And while the mse is very similar in both cases, the r2 is way higher for the non-linear case. I think this is what is often found in literature, that the r2 for non-linear cases is unrealistic high:. Anyway, maybe have a look and tell me what you think: https://github.com/janrth/r2_non_linearity/blob/master/r2_for_non_linear_function.ipynb – janrth Dec 20 '21 at 22:14
Please post that as a new question and write @Dave when you post back here with a link. You’ve made some (common) mistakes and deserve to have them addressed in a full answer, not just a comment. – Dave Dec 20 '21 at 22:42
@Dave: It was not my question in the first place. I just tried to help and said that r-squared is not (mathematically) valid for non-linear functions. This is common knowledge in research I think. Have you read the paper I mentioned above? Also my simple example shows you how unreliable r-squared is for non-linear functions. The question whether r2 is a valid metric for non-linear functions should have been answered here in some other posts, so I don't want to duplicate. But maybe you could point me to research that shows you can use r2 for non-linear functions, so that I can also consider it – janrth Dec 21 '21 at 10:33
1

You have the equation for $R^2$ in these comments. If you remain confused about how that is proof of how MSE and $R^2$ are equivalent evaluation metrics, please post a new question. I don’t know where that is addressed explicitly on here, so I would not expect such a question to be closed as a duplicate. – Dave Dec 21 '21 at 10:58
@Dave: Please read the link I share in this comment. Pseudo-r2 might be used, but it should not be used for model selection. But MSE can be as shown by Ratkowsky in 1990. I don't understand what your point really is. If you want to select your regression model based on r2, then please do so. I also said that people still do it a lot. But my simple example also showed how very wrong an (pseudo-)r2 can be compared to an mse for non-linear functions. Also I pointed to the (probably) most cited paper when it comes to r2 in non-linear systems with thousands of simulations. – janrth Dec 21 '21 at 12:30
@Dave: https://www.r-bloggers.com/2021/03/the-r-squared-and-nonlinear-regression-a-difficult-marriage/ – janrth Dec 21 '21 at 12:37
@David: I also posted now the question whether mse is valid for non-linear regression while r2 might not be. I hope that will resolve the problem between us:) https://stats.stackexchange.com/questions/557863/is-r-squared-equivalent-to-mean-squared-error-for-non-linear-regression – janrth Dec 21 '21 at 12:38
1

Values of RMSE below 1 are great: this is nonsensical, as a change of units can always achieve this. RMSE is 5 metres? Just change to km, and then RMSE is 0.005 km, so that solves the problem. – Nick Cox Dec 21 '21 at 14:12
This mixes much good advice with (in my view) some exaggeration and at least one outright error (see above). My edits are restricted to points of English expression. "not being mathematically valid" is a poor criticism, as R-squared can be well defined, as the answer does. "not a good idea or good method" is a stance that can be explained. – Nick Cox Dec 21 '21 at 14:17

$R^2$ (OOB) worse than $R^2$ (test)

1 Answers1

Linked