In machine learning, people talk about objective function, cost function, loss function. Are they just different names of the same thing? When to use them? If they are not always refer to the same thing, what are the differences?
-
3See also http://stats.stackexchange.com/questions/73221/what-is-a-loss-function-in-decision-theory – Nick Cox Oct 28 '15 at 18:05
9 Answers
These are not very strict terms and they are highly related. However:
- Loss function is usually a function defined on a data point, prediction and label, and measures the penalty. For example:
- square loss $l(f(x_i|\theta),y_i) = \left (f(x_i|\theta)-y_i \right )^2$, used in linear regression
- hinge loss $l(f(x_i|\theta), y_i) = \max(0, 1-f(x_i|\theta)y_i)$, used in SVM
- 0/1 loss $l(f(x_i|\theta), y_i) = 1 \iff f(x_i|\theta) \neq y_i$, used in theoretical analysis and definition of accuracy
- Cost function is usually more general. It might be a sum of loss functions over your training set plus some model complexity penalty (regularization). For example:
- Mean Squared Error $MSE(\theta) = \frac{1}{N} \sum_{i=1}^N \left (f(x_i|\theta)-y_i \right )^2$
- SVM cost function $SVM(\theta) = \|\theta\|^2 + C \sum_{i=1}^N \xi_i$ (there are additional constraints connecting $\xi_i$ with $C$ and with training set)
- Objective function is the most general term for any function that you optimize during training. For example, a probability of generating training set in maximum likelihood approach is a well defined objective function, but it is not a loss function nor cost function (however you could define an equivalent cost function). For example:
- MLE is a type of objective function (which you maximize)
- Divergence between classes can be an objective function but it is barely a cost function, unless you define something artificial, like 1-Divergence, and name it a cost
Long story short, I would say that:
A loss function is a part of a cost function which is a type of an objective function.
All that being said, thse terms are far from strict, and depending on context, research group, background, can shift and be used in a different meaning. With the main (only?) common thing being "loss" and "cost" functions being something that want wants to minimise, and objective function being something one wants to optimise (which can be both maximisation or minimisation).

- 4,101
- 1
- 12
- 15
-
14+1. I've not seen a source for this but I have guessed that "objective" is the term used because it is your goal or objective to optimise that function, which could mean maximising something good or minimising something bad, although that difference is trivial, as any function can be negated. In contrast, the pejorative overtones of "loss" and "cost" do bite: I'd say it would be perverse to use either term except for something to be minimised. These points are tacit in your fine answer but deserve a little more emphasis. – Nick Cox Oct 28 '15 at 15:36
-
4The "M" in "MLE" stands for "maximum" not "minimum." I only mention this pedantic detail because this question was migrated from stackoverflow, and I've been bitten by the bug of minimizing the wrong function before – Taylor Dec 26 '16 at 22:33
-
Actually, the objective function is the function (e.g. a linear function) you seek to optimize (usually by minimizing or maximizing) under the constraint of a loss function (e.g. L1, L2). Examples are ridge regression or SVM. You can also optimize the objective function without any loss function, e.g. simple OLS or logit. – g3o2 Jun 10 '17 at 16:54
-
1@ Nick Cox wrote ' the pejorative overtones of "loss" and "cost" do bite: I'd say it would be perverse to use either term except for something to be minimise' I disagree, loss or cost can be maximized in order to find the worst possible case (subject to whatever constraints). This can be useful for worst case analysis. – Mark L. Stone Jun 10 '17 at 22:40
-
I find it hard to keep the difference between "loss" and "cost" straight other than with rote memorization. The problem is that the English definitions of the words don't give any clues as to which should be which, nor is there any obvious mnemonic. Any suggestions welcome. – Stephen Sep 02 '17 at 21:27
-
1@Stephen mnemonic: loss=lonely (one data point), cost = comprehensive. – Tom Hale Aug 11 '18 at 09:58
-
*MLE is a type of objective function (which you maximize)*. Actually, likelihood is the function being maximizen. MLE is the formula yielding the argument that maximizes the likelihood function, or the result of that formula when applied on a given dataset. Otherwise a nice answer. – Richard Hardy Nov 21 '19 at 10:38
-
would accuracy in classification also be considered a cost function? – haneulkim May 12 '21 at 01:02
-
Arguably it depends on the use of accuracy, as a non differentiable metric, it is rarely used as an objective, and more commonly as an evaluation measure. That being said it can definitely be seen as a cost function, and with some gradient-free optimisation it can be directly optimised too. This being said, all these naming conventions evolve, and are ill defined, unfortunately machine learning is not mathematics, where things have one, strict definition. – lejlot May 12 '21 at 09:13
Quoting from section 4.3 in "Deep Learning" book by Ian Goodfellow, Yoshua Bengio, Aaron Courville (emphasis in the original):
The function we want to minimize or maximize is called the objective function, or criterion. When we are minimizing it, we may also call it the cost function, loss function, or error function. In this book, we use these terms interchangeably, though some machine learning publications assign special meaning to some of these terms.
In this book at least, loss and cost are the same.

- 278
- 2
- 10

- 171
- 1
- 2
In Andrew NG's words-
"Finally, the loss function was defined with respect to a single training example. It measures how well you're doing on a single training example. I'm now going to define something called the cost function, which measures how well you're doing an entire training set. So the cost function J which is applied to your parameters W and B is going to be the average with one of the m of the sum of the loss function applied to each of the training examples and turn."

- 141
- 1
- 4
According to Prof. Andrew Ng (see slides on page 11),
Function h(X) represents your hypothesis. For fixed fitting parameters theta, it is a function of features X. I'd say this can also be called the Objective Function.
The Cost function J is a function of the fitting parameters theta. J = J(theta).
According to the Hastie et al.'s textbook "Elements of Statistical Learning", by p.37:
"We seek a function f (X) for predicting Y given values of the input X." [...] the loss function L(Y, f(X)) is "a function for penalizing the errors in prediction",
So it seems "loss function" is a slightly more general term than "cost function". If you seek for "loss" in that PDF, I think that they use "cost function" and "loss function" somewhat synonymously.
Indeed, p. 502
"The situation [in Clustering] is somewhat similar to the specification of a loss or cost function in prediction problems (supervised learning)".
Maybe these terms exist because they evolved independently in different academic communities. "Objective Function" is an old term used in Operations Research, and Engineering Mathematics. "Loss function" might be more in use among statisticians. But I'm speculating here.

- 429
- 7
- 9
-
6loss function is nowhere near being "more general" than cost function. f(X) is in particular the function of your parameters (thus J(theta)), making it (loss function) a particular type of cost function. Furthermore, Hastie has a simplification there, he assumes **additive loss functions**, which create a particular class of **cost functions** – lejlot Oct 28 '15 at 10:13
-
I just tried to answer this question with references from the academic literature, sources which are easy to understand. Your point of "additive loss functions" might be right, but is well beyond the scope of the question asked, and I can't find this specific term in the ESL book – knb Oct 28 '15 at 11:47
-
3
-
Is this "I'd say" from Ng or you? h is the model (h for hypothesis). The objective is that h performs well. The objective function measures how well h does and is usually different from h. – Joachim Wagner Feb 01 '18 at 17:58
-
The loss function computes the error for a single training example, while the cost function is the average of the loss functions of the entire training set.

- 61
- 1
- 1
The terms cost and loss functions are synonymous. Some people also call them the error function. The more general scenario is to define an objective function first that we want to optimize. This objective function could be to:
- maximize the posterior probabilities (e.g., naive Bayes)
- maximize a fitness function (genetic programming)
- maximize the total reward/value function (reinforcement learning)
- maximize information gain/minimize child node impurities (CART decision tree classification)
- minimize a mean squared error cost (or loss) function (CART, decision tree regression, linear regression, adaptive linear neurons, …
- maximize log-likelihood or minimize cross-entropy loss (or cost) function minimize hinge loss (support vector machine)

- 103
- 3

- 121
- 3
Actually to be simple If you have m training data like this (x(1),y(1)),(x(2),y(2)), . . . (x(m),y(m)) We use loss function L(ycap,y) to find loss between ycap and y of a single training set If we want to find loss between ycap and y of a whole training set we use cost function.
Note:- ycap means output from our model And y means expected output
Note:- Credit goes Andrew ng Resource: coursera neural network and deep learning

- 21
- 2
To give you a short answer, according to me they are synonymous. However, the cost function is used more in optimization problem and loss function is used in parameter estimation.

- 129
- 1
How about Score function?
Not related directly to the question, but I wanted to add this here, to have a completed reference to all these computational terminologies.
In statistics, the score (or informant[1]) is the gradient of the log-likelihood function with respect to the parameter vector.
This term is used specifically with the Maximum likelihood estimation under the econometric modeling field. So, the Score function can also be considered as an objective function.
Reference

- 121
- 1