Coefficient for linear and non-linear regression

Question

I have used a deep NN for performing regression analysis with multiple independent variables and then predicting one dependent varible.

To understand the quality of the regression I have used $R^2$, but it is typically used for linear regression.

My question is, Can I use $R^2$ coefficient for determining the quality of such regression. Please take into account that the problem I'm focusing on should be non-linear. If no, which would be the corrent coefficient, instead of $R^2$, in case of non-linear regression.

Thank you in advance

$R^2$ does not have the “proportion of variance explained” interpretation in the case of nonlinear regression, of which a neural network is one example: https://stats.stackexchange.com/a/500456/247274. I derive that fact in another post, but it should be in any introductory regression textbook that works with linear algebra, such as Agresti’s “Foundations of Linear and Generalized Linear Models”. — Dave, Jan 26 '21 at 01:49
The bizarre part of $R^2$, even in the linear case, is that it isn’t as simple as “$90\%$ is an $\text{A}$ in school, so $R^2=0.9$ is good.” Such a value might be poor in some settings, while $0.2$ might be wonderful in others. — Dave, Jan 26 '21 at 01:56

score 0 · Answer 1 · answered Jan 25 '21 at 19:39

0

R2 can be used. Also you can check all the loss functions used in regression settings, such as MSE (mean squares error) MAE (mean absolute error) etc.

answered Jan 25 '21 at 19:39

Haitao Du

32,885
17
118
213

Thank you for your reply. So your are saying that if I have a R2=0.95 and all the other metrics (MSE, MAE, etc) good as well, it means that the R2 is saying good even in non-linear regression? – user3043636 Jan 25 '21 at 19:44
should I use RMSE as the best metric in case of non linear ? – user3043636 Jan 25 '21 at 19:50

score 0 · Answer 2 · answered Jan 25 '21 at 20:38

"Linearity" is not an issue here. You can most likely interpret your regression as a linear regression over non-linearly transformed variable. What matters is your loss function. If you're minimising the sum of squared errors, $R^2$ is the perfect measure of performance.

I disagree with Haitao: $MSE$ is redundant with $R^2$: $MSE = 0 \Leftrightarrow R^2 = 1$ and $MSE = Var(y) \Leftrightarrow R^2 = 0$. Mean absolute error is likely to be correlated, but a less suitable measure than $R^2$ -- unless, of course, your regression is minimising the sum of absolute errors.

score 0 · Answer 3 · answered Jan 25 '21 at 22:07

I wouldn't really use R2 for NN comparisons. I would instead use the name of the forum and look at your cross validation MSE (you could use cross validation R2 but I would just use the raw MSE) or RMSE. The issue is that you can, with NNs or Random Forests or other non-parametric models, get inflated R2 values.

The 'best' NN then is the one which minimizes your average MSE values across all test folds. So I wouldn't say that a model is a 'good' model or not, just that it is the 'best' or most 'useful' of the ones you tried.

And even for a linear model, a model with a R2 of .95 is, in my experience, still a 50/50 if it is any better than a model of the same data with an R2 of .9 so I would never really say a model is good based off of it alone.

Coefficient for linear and non-linear regression

3 Answers3