Plus/Minus Model accuracy from $R^2$

Question

I completed a linear regression for a model I was working on, and obtained that the $R^2$ value was $R^2 = 0.801$.

Can one assess a $\pm$ error from this value for future predictions? I.e., if I now use this linear model to predict against a new set of data, can I use this $R^2$ value to get a $\pm$ value on that prediction?

score 6 · Accepted Answer · answered Jul 25 '18 at 16:17

No, you can't, for two reasons.

$R^2$ indicates the proportion of variance explained by your model. $R^2=0.80$ can mean that you explain 80% of very little variance, so your prediction-interval (PI) should be small. Or it can mean that you explain 80% of a huge lot of variance, so your PIs should be large.
$R^2$ is an in-sample measure of model fit. In-sample fits are very misleading as guides to out-of-sample predictive accuracy.

To calculate PIs for multiple regression, take a look at this earlier thread: How to calculate the prediction interval for an OLS multiple regression?

ERT · Answer 2 · 2018-07-25T16:21:04.330

You are talking about Root Mean Squared Error of Prediction (RMSEP).

It is fundamentally different than the $R^2$ value, and they are not related in the way you are hoping. Your $R^2$ value is the approximate amount of $y$-variance (dependent variable variance) explained by your $x$-matrix of covariates (independent variables). This is answering the question "how much variance can I explain with my given set of predictors?"

Your RMSEP (explained in this website on calibration) is the approximate error that your model will produce in predicting a future out-of-sample value. It is a metric used to answer "if I use my current model in the real-world, how much error will it produce while predicting?"

$R^2$ is used to determine how much variance a model explains. RMSEP is used to determine how well your model can predict out-of-sample values. They are not related.

Plus/Minus Model accuracy from $R^2$

2 Answers2

Linked