Why is Mean squared error a good loss function?

Question

I am currently studying the statistics behind machine learning and my question is why the mean squared error is a good selection for the loss function? Does it have good statistical properties (unbiased, consistent)? Or practical advantages?

Someone told me that it is unbiased but I still do not see why it is even a estimator, since it is rather a function which evaluates your point estimator. I would appreciate any type of clarification since I could not find anything online about this topic.

Related [Why square the difference instead of taking the absolute value in standard deviation?](https://stats.stackexchange.com/questions/118/why-square-the-difference-instead-of-taking-the-absolute-value-in-standard-devia) — user2974951, Nov 04 '21 at 10:57
[Mean and Median properties](https://stats.stackexchange.com/questions/7307/mean-and-median-properties) — user2974951, Nov 04 '21 at 10:58
There are no silver bullets. There is no “one metric to rule them all”. We are mechanics in the garage, and we look to have toolboxes with a decent diversity of tools where we have decent familiarity with each of them. Every metric is a dog metric. There is always a place where each of them falls on its own sword. There is always a place where each of them stands tall. It takes familiarity with both the place and the metric to determine the fit, that is why it is a practice, a little bit of artisanship. — EngrStudent, Nov 04 '21 at 11:04
I totally agree with all the comments. I just got that question in an interview and then the interviewer said something about likelihood, unbiased and estimator and that a lot of people forget about the statistical properties of MSE (he studied in cambridge). But I could not find anything related. That is why I asked here. — Patricio, Nov 04 '21 at 12:01

Dave · Answer 1 · 2021-11-04T12:21:28.583

3

Based on your last comment about “likelihood”, “unbiased”, and “estimator”, I think I know what your interviewer meant.

When you assume $iid$ Gaussian error terms, which is a common assumption, in linear regression, minimizing square loss gives the same solution as maximum likelihood estimation of the regression parameters. That is:

$$ \hat\beta_{MLE}=\hat\beta_{OLS}=(X^TX)^{-1}X^Ty $$

Further, even under milder conditions (don’t even need Gaussian error terms), the Gauss-Markov theorem says the OLS solution is the best linear unbiased estimator (BLUE), where “best” means lowest variance (among linear and unbiased estimators).

If you drop either the linear requirement or the unbiased requirement, you can get lower variances, however. For those (and probably other) reasons, square loss might not be the best estimation method. As another comment remarked, there is no silver bullet.

edited Nov 04 '21 at 12:21

answered Nov 04 '21 at 12:15

Dave

28,473
4
52
104

1

+1. This is a very good answer from a statistical perspective. It is also worth noting that the loss is a convex function and ,assuming $X$ is not rank deficient, has unique minimum making optimization a straightforward problem to solve. Additionally, MSE is a proper scoring rule, as compared to something like accuracy or AUC (I know we're talking about linear regression, but we could just as easily fit a logistic regression by minimizing the brier score. There are just problems with the gradient were we to do that). Lots of reasons to like MSE. – Demetri Pananos Nov 04 '21 at 13:59
thank you both very much. I think this is exactly what he meant. :) – Patricio Nov 06 '21 at 18:49

Why is Mean squared error a good loss function?

1 Answers1