Questions tagged [loss-functions]

A function used to quantify the difference between observed data and predicted values according to a model. Minimization of loss functions is a way to estimate the parameters of the model.

Examples include:

  • The (Root) Mean Squared Error, , used in "ordinary" regression or ordinary (OLS)
  • The Mean Absolute Error, , frequently used in forecasting
  • "Hinge" losses, or linear losses where over- and underpredictions are weighted differently, for
  • (Proper) , used to compare predictive densities to actuals
975 questions
113
votes
6 answers

What loss function for multi-class, multi-label classification tasks in neural networks?

I'm training a neural network to classify a set of objects into n-classes. Each object can belong to multiple classes at the same time (multi-class, multi-label). I read that for multi-class problems it is generally recommended to use softmax and…
aKzenT
  • 1,231
  • 2
  • 8
  • 5
63
votes
4 answers

Should I use a categorical cross-entropy or binary cross-entropy loss for binary predictions?

First of all, I realized if I need to perform binary predictions, I have to create at least two classes through performing a one-hot-encoding. Is this correct? However, is binary cross-entropy only for predictions with only one class? If I were to…
59
votes
5 answers

Which loss function is correct for logistic regression?

I read about two versions of the loss function for logistic regression, which of them is correct and why? From Machine Learning, Zhou Z.H (in Chinese), with $\beta = (w, b)\text{ and }\beta^Tx=w^Tx +b$: $$l(\beta) =…
xtt
  • 724
  • 1
  • 6
  • 10
58
votes
4 answers

Cross Entropy vs. Sparse Cross Entropy: When to use one over the other

I am playing with convolutional neural networks using Keras+Tensorflow to classify categorical data. I have a choice of two loss functions: categorial_crossentropy and sparse_categorial_crossentropy. I have a good intuition about the…
58
votes
5 answers

Cost function of neural network is non-convex?

The cost function of neural network is $J(W,b)$, and it is claimed to be non-convex. I don't quite understand why it's that way, since as I see that it's quite similar to the cost function of logistic regression, right? If it is non-convex, so the…
avocado
  • 3,045
  • 5
  • 32
  • 45
48
votes
1 answer

What is the difference between a loss function and an error function?

Is the term "loss" synonymous with "error"? Is there a difference in definition? Also, what is the origin of the term "loss"? NB: The error function mentioned here is not to be confused with normal error.
Dan Kowalczyk
  • 591
  • 1
  • 4
  • 8
44
votes
3 answers

Dice-coefficient loss function vs cross-entropy

When training a pixel segmentation neural network, such as a fully convolutional network, how do you make the decision to use the cross-entropy loss function versus Dice-coefficient loss function? I realize this is a short question, but not quite…
Christian
  • 1,382
  • 3
  • 16
  • 27
40
votes
4 answers

L1 regression estimates median whereas L2 regression estimates mean?

So I was asked a question on which central measures L1 (i.e., lasso) and L2 (i.e., ridge regression) estimated. The answer is L1=median and L2=mean. Is there any type of intuitive reasoning to this? Or does it have to be determined algebraically? If…
Bstat
  • 791
  • 1
  • 7
  • 5
40
votes
1 answer

Training loss goes down and up again. What is happening?

My training loss goes down and then up again. It is very weird. The cross-validation loss tracks the training loss. What is going on? I have two stacked LSTMS as follows (on Keras): model = Sequential() model.add(LSTM(512, return_sequences=True,…
patapouf_ai
  • 503
  • 1
  • 5
  • 7
37
votes
2 answers

Quantile regression: Loss function

I am trying to understand the quantile regression, but one thing that makes me suffer is the choice of the loss function. $\rho_\tau(u) = u(\tau-1_{\{u<0\}})$ I know that the minimum of the expectation of $\rho_\tau(y-u)$ is equal to the…
CDO
  • 473
  • 1
  • 4
  • 6
35
votes
3 answers

Gradient of Hinge loss

I'm trying to implement basic gradient descent and I'm testing it with a hinge loss function i.e. $l_{\text{hinge}} = \max(0,1-y\ \boldsymbol{x}\cdot\boldsymbol{w})$. However, I'm confused about the gradient of the hinge loss. I'm under the…
brcs
  • 513
  • 1
  • 5
  • 8
35
votes
3 answers

Training loss increases with time

I am training a model (Recurrent Neural Network) to classify 4 types of sequences. As I run my training I see the training loss going down until the point where I correctly classify over 90% of the samples in my training batches. However a couple of…
35
votes
1 answer

XGBoost Loss function Approximation With Taylor Expansion

As an example, take the objective function of the XGBoost model on the $t$'th iteration: $$\mathcal{L}^{(t)}=\sum_{i=1}^n\ell(y_i,\hat{y}_i^{(t-1)}+f_t(\mathbf{x}_i))+\Omega(f_t)$$ where $\ell$ is the loss function, $f_t$ is the $t$'th tree output…
Alex R.
  • 13,097
  • 2
  • 25
  • 49
31
votes
5 answers

Yolo Loss function explanation

I am trying to understand the Yolo v2 loss function: \begin{align} &\lambda_{coord} \sum_{i=0}^{S^2}\sum_{j=0}^B \mathbb{1}_{ij}^{obj}[(x_i-\hat{x}_i)^2 + (y_i-\hat{y}_i)^2 ] \\&+ \lambda_{coord} \sum_{i=0}^{S^2}\sum_{j=0}^B…
31
votes
2 answers

Cost function in OLS linear regression

I'm a bit confused with a lecture on linear regression given by Andrew Ng on Coursera about machine learning. There, he gave a cost function that minimises the sum-of-squares as: $$ \frac{1}{2m} \sum _{i=1}^m \left(h_\theta(X^{(i)})-Y^{(i)}\right)^2…
SmallChess
  • 6,764
  • 4
  • 27
  • 48
1
2 3
64 65