-2

Why take the square of the difference between the label and the prediction? What is the advantage of squaring?

Siju George
  • 101
  • 1
  • Parameter estimates have optimality properties (maximum likelihood) when the error terms in the model have independent and identically distributed normal distributions. Under other conditions it may be that they have the property of minimum variance among unbiased estimators. But there are also conditions where sum of squared errors is not optimal in any particular sense. – Michael R. Chernick Jul 24 '18 at 05:01

1 Answers1

1

If we don't do it, the difference could be negative.

Square difference is more robust and stable than just taking absolute difference. Lot's of questions here about it, please search.

SmallChess
  • 6,764
  • 4
  • 27
  • 48