Questions tagged [loss-functions]

A function used to quantify the difference between observed data and predicted values according to a model. Minimization of loss functions is a way to estimate the parameters of the model.

Examples include:

The (Root) Mean Squared Error, rms, used in "ordinary" regression or ordinary least-squares (OLS)
The Mean Absolute Error, mae, frequently used in forecasting
"Hinge" losses, or linear losses where over- and underpredictions are weighted differently, for quantile-regression
(Proper) scoring-rules, used to compare predictive densities to actuals

975 questions

113

votes

6 answers

What loss function for multi-class, multi-label classification tasks in neural networks?

I'm training a neural network to classify a set of objects into n-classes. Each object can belong to multiple classes at the same time (multi-class, multi-label). I read that for multi-class problems it is generally recommended to use softmax and…

neural-networks python loss-functions keras cross-entropy

asked Apr 17 '16 at 14:28

aKzenT

1,231
2
8
5

votes

4 answers

Should I use a categorical cross-entropy or binary cross-entropy loss for binary predictions?

First of all, I realized if I need to perform binary predictions, I have to create at least two classes through performing a one-hot-encoding. Is this correct? However, is binary cross-entropy only for predictions with only one class? If I were to…

machine-learning neural-networks loss-functions tensorflow cross-entropy

asked Feb 07 '17 at 15:02

infomin101

1,363
4
14
20

votes

5 answers

Which loss function is correct for logistic regression?

I read about two versions of the loss function for logistic regression, which of them is correct and why? From Machine Learning, Zhou Z.H (in Chinese), with $\beta = (w, b)\text{ and }\beta^Tx=w^Tx +b$: $$l(\beta) =…

logistic loss-functions

asked Dec 11 '16 at 17:05

xtt

votes

4 answers

Cross Entropy vs. Sparse Cross Entropy: When to use one over the other

I am playing with convolutional neural networks using Keras+Tensorflow to classify categorical data. I have a choice of two loss functions: categorial_crossentropy and sparse_categorial_crossentropy. I have a good intuition about the…

machine-learning conv-neural-network loss-functions information-theory cross-entropy

asked Jan 31 '18 at 15:03

kedarps

2,902
2
19
30

votes

5 answers

Cost function of neural network is non-convex?

The cost function of neural network is $J(W,b)$, and it is claimed to be non-convex. I don't quite understand why it's that way, since as I see that it's quite similar to the cost function of logistic regression, right? If it is non-convex, so the…

machine-learning neural-networks loss-functions

asked Jul 09 '14 at 13:59

avocado

3,045
5
32
45

votes

1 answer

What is the difference between a loss function and an error function?

Is the term "loss" synonymous with "error"? Is there a difference in definition? Also, what is the origin of the term "loss"? NB: The error function mentioned here is not to be confused with normal error.

loss-functions

asked Jul 26 '18 at 00:00

Dan Kowalczyk

votes

3 answers

Dice-coefficient loss function vs cross-entropy

When training a pixel segmentation neural network, such as a fully convolutional network, how do you make the decision to use the cross-entropy loss function versus Dice-coefficient loss function? I realize this is a short question, but not quite…

neural-networks loss-functions cross-entropy

asked Jan 04 '18 at 03:12

Christian

1,382
3
16
27

votes

4 answers

L1 regression estimates median whereas L2 regression estimates mean?

So I was asked a question on which central measures L1 (i.e., lasso) and L2 (i.e., ridge regression) estimated. The answer is L1=median and L2=mean. Is there any type of intuitive reasoning to this? Or does it have to be determined algebraically? If…

lasso regularization loss-functions ridge-regression

asked Aug 19 '12 at 06:16

Bstat

votes

1 answer

Training loss goes down and up again. What is happening?

My training loss goes down and then up again. It is very weird. The cross-validation loss tracks the training loss. What is going on? I have two stacked LSTMS as follows (on Keras): model = Sequential() model.add(LSTM(512, return_sequences=True,…

machine-learning neural-networks loss-functions lstm

asked Mar 11 '16 at 10:18

patapouf_ai

votes

2 answers

Quantile regression: Loss function

I am trying to understand the quantile regression, but one thing that makes me suffer is the choice of the loss function. $\rho_\tau(u) = u(\tau-1_{\{u<0\}})$ I know that the minimum of the expectation of $\rho_\tau(y-u)$ is equal to the…

quantiles loss-functions quantile-regression

asked Dec 14 '16 at 17:14

CDO

votes

3 answers

Gradient of Hinge loss

I'm trying to implement basic gradient descent and I'm testing it with a hinge loss function i.e. $l_{\text{hinge}} = \max(0,1-y\ \boldsymbol{x}\cdot\boldsymbol{w})$. However, I'm confused about the gradient of the hinge loss. I'm under the…

loss-functions

asked Nov 17 '10 at 03:15

brcs

votes

3 answers

Training loss increases with time

I am training a model (Recurrent Neural Network) to classify 4 types of sequences. As I run my training I see the training loss going down until the point where I correctly classify over 90% of the samples in my training batches. However a couple of…

machine-learning neural-networks loss-functions recurrent-neural-network training-error

asked Jan 24 '18 at 19:39

dins2018

votes

1 answer

XGBoost Loss function Approximation With Taylor Expansion

As an example, take the objective function of the XGBoost model on the $t$'th iteration: $$\mathcal{L}^{(t)}=\sum_{i=1}^n\ell(y_i,\hat{y}_i^{(t-1)}+f_t(\mathbf{x}_i))+\Omega(f_t)$$ where $\ell$ is the loss function, $f_t$ is the $t$'th tree output…

optimization loss-functions boosting taylor-series

asked Mar 21 '16 at 19:04

Alex R.

13,097
2
25
49

votes

5 answers

Yolo Loss function explanation

I am trying to understand the Yolo v2 loss function: \begin{align} &\lambda_{coord} \sum_{i=0}^{S^2}\sum_{j=0}^B \mathbb{1}_{ij}^{obj}[(x_i-\hat{x}_i)^2 + (y_i-\hat{y}_i)^2 ] \\&+ \lambda_{coord} \sum_{i=0}^{S^2}\sum_{j=0}^B…

neural-networks loss-functions object-detection yolo

asked Jun 27 '17 at 01:56

Kamel BOUYACOUB

votes

2 answers

Cost function in OLS linear regression

I'm a bit confused with a lecture on linear regression given by Andrew Ng on Coursera about machine learning. There, he gave a cost function that minimises the sum-of-squares as: $$ \frac{1}{2m} \sum _{i=1}^m \left(h_\theta(X^{(i)})-Y^{(i)}\right)^2…

regression machine-learning loss-functions

asked Jun 05 '15 at 03:04

SmallChess

6,764
4
27
48

2 3

…

64 65 Next