standardizing the MSE

Question

I conducted a Ridge regression with k-fold validation. All the predictors were scaled prior to the regression procedure.

To report on the accuracy of my model's prediction, I calculated the MSE in the testing sets. That is, for each of the k testing sets, I calculated the following:

mean( (y actual - y predicted) ^ 2 )

However, my problem is that because this measure is not standardized and because it depends on the y variable scale, I cannot interpret it as either low or high.

Therefore, my question is: Is there a legitimate way to standardize the MSE so it could be compared to other studies?

Thank you very much, Lior

This sounds like an XY problem where you want to be able to interpret the residuals (X) and surmise that standardizing the MSE (Y) would help you do that. Am I about right? (X and Y here are unrelated to the $X$ and $Y$ of regression fame.) — Dave, Jun 13 '20 at 14:53
Thank you for the help. Stefgehrig, I did not standardize the Y. Dave, I think that you are about right, although I am not sure. What do you mean when you say that X and Y are not related to the X and Y in the regression frame? — Lior, Jun 13 '20 at 16:16
An XY problem is a term from technical support where a person encounters issue X, attempts a solution, realizes she needs to solve Y to get that solution to work, and then asks about Y instead of X. That we happen to use $X$ and $Y$ in regression is total coincidence. So do you want to solve some problem, and your attempted solution involves standardizing MSE? — Dave, Jun 13 '20 at 16:42

Dave · Answer 1 · 2020-06-13T14:59:22.137

In a linear model, that is what $R^2$ does. The trouble is that it’s hard to say if $R^2$ is low or high, too! $R^2=0.30$ might be stellar for one study, while $R^2=0.90$ might be pitiful for another.

The reason that I say this is for a linear model is because of how the total sum of squares decomposes into the sum of squares of the regression and the sum of squares of the residuals (related to MSE). With nonlinear models, there is a third term that the traditional $R^2$ equation does not consider. In linear models, that third term is always $0$.

I posted something about this last year. I’ll be back with a link.

EDIT

Look at the math in this post of mine where I decompose the total sum of squares for both linear and nonlinear models: Neural Net Regression SSE Loss. (That I’m discussing neural networks in particular is not so important.)

standardizing the MSE

1 Answers1