1

I am currently studying the statistics behind machine learning and my question is why the mean squared error is a good selection for the loss function? Does it have good statistical properties (unbiased, consistent)? Or practical advantages?

Someone told me that it is unbiased but I still do not see why it is even a estimator, since it is rather a function which evaluates your point estimator. I would appreciate any type of clarification since I could not find anything online about this topic.

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
Patricio
  • 111
  • 2
  • 1
    Related [Why square the difference instead of taking the absolute value in standard deviation?](https://stats.stackexchange.com/questions/118/why-square-the-difference-instead-of-taking-the-absolute-value-in-standard-devia) – user2974951 Nov 04 '21 at 10:57
  • [Mean and Median properties](https://stats.stackexchange.com/questions/7307/mean-and-median-properties) – user2974951 Nov 04 '21 at 10:58
  • 1
    Unbiased for what? – Dave Nov 04 '21 at 10:58
  • There are no silver bullets. There is no “one metric to rule them all”. We are mechanics in the garage, and we look to have toolboxes with a decent diversity of tools where we have decent familiarity with each of them. Every metric is a dog metric. There is always a place where each of them falls on its own sword. There is always a place where each of them stands tall. It takes familiarity with both the place and the metric to determine the fit, that is why it is a practice, a little bit of artisanship. – EngrStudent Nov 04 '21 at 11:04
  • I totally agree with all the comments. I just got that question in an interview and then the interviewer said something about likelihood, unbiased and estimator and that a lot of people forget about the statistical properties of MSE (he studied in cambridge). But I could not find anything related. That is why I asked here. – Patricio Nov 04 '21 at 12:01

1 Answers1

3

Based on your last comment about “likelihood”, “unbiased”, and “estimator”, I think I know what your interviewer meant.

When you assume $iid$ Gaussian error terms, which is a common assumption, in linear regression, minimizing square loss gives the same solution as maximum likelihood estimation of the regression parameters. That is:

$$ \hat\beta_{MLE}=\hat\beta_{OLS}=(X^TX)^{-1}X^Ty $$

Further, even under milder conditions (don’t even need Gaussian error terms), the Gauss-Markov theorem says the OLS solution is the best linear unbiased estimator (BLUE), where “best” means lowest variance (among linear and unbiased estimators).

If you drop either the linear requirement or the unbiased requirement, you can get lower variances, however. For those (and probably other) reasons, square loss might not be the best estimation method. As another comment remarked, there is no silver bullet.

Dave
  • 28,473
  • 4
  • 52
  • 104
  • 1
    +1. This is a very good answer from a statistical perspective. It is also worth noting that the loss is a convex function and ,assuming $X$ is not rank deficient, has unique minimum making optimization a straightforward problem to solve. Additionally, MSE is a proper scoring rule, as compared to something like accuracy or AUC (I know we're talking about linear regression, but we could just as easily fit a logistic regression by minimizing the brier score. There are just problems with the gradient were we to do that). Lots of reasons to like MSE. – Demetri Pananos Nov 04 '21 at 13:59
  • thank you both very much. I think this is exactly what he meant. :) – Patricio Nov 06 '21 at 18:49