What loss function to use if I try to minimize $\frac{1}{n} \sum_{i=1}^n (y'_i - y_i)^2$

Question

So my goal is to minimize

$$\frac{1}{n} \sum_{i=1}^n (y'_i - y_i)^2$$

Where $y'$ is output of network and $y_i$ is a target label.

I have two questions:

What is the name of this minimization function? (Least sum of squares?)
If I want to implement it in neural networks, what loss function do I use?

Thank you!

Did you mean to write $(y^\prime_i - y_i)^2$ ? What you've written can be made arbitrarily small by predicting values that tend toward $-\infty$. — Sycorax, Jan 24 '19 at 17:27
You wrote that the $y$ are labels... do you mean like `cat` and `dog`? How are you computing `cat - dog` then? Hopefully you mean that they are real numbers? — Jake Westfall, Jan 24 '19 at 17:36

score 2 · Accepted Answer · answered Jan 24 '19 at 18:44

This is called mean-squared-error loss.

If you try to use this loss, and train the model with gradient descent, you may run into a problem. This is because it sounds like you have a classification task, since you write about "labels". A neural network for classification with no hidden layer and softmax outputs is exactly a logistic regression. If you attempt to use mean-squared-error loss to estimate a logistic regression, you'll run into problems because this optimization task is not convex.

Fo more information, see What is happening here, when I use squared loss in logistic regression setting?

score 0 · Answer 2 · answered Feb 02 '19 at 16:59

This equation is known as mean squared error (mean squared loss).
This equation is helpful in situations where y' and y_i both are real values (regression problems). Else in classification problem this equation totally fails, Here we try to minimize the mean difference between actual value and the predicted value. If we take a classification problem and use the same equation we can observe that y_i is going to have only 2 values (for binary classification) or n values where n = number of classes. In classification problems to try to estimate correctly the probability of class, this equation does not handle the error in probability. If this equation is applied to a classification problem user is destined to run into a problem.

What loss function to use if I try to minimize $\frac{1}{n} \sum_{i=1}^n (y'_i - y_i)^2$

2 Answers2