Is Akaike Information Criterion a kind of loss function?

Question

I guess it probably is, but just want a confirmation. Thanks!

score 3 · Answer 1 · edited Apr 13 '17 at 12:44

3

From Wikipedia's Loss function: "In mathematical optimization, statistics, decision theory and machine learning, a loss function or cost function is a function that maps an event or values of one or more variables onto a real number intuitively representing some "cost" associated with the event. An optimization problem seeks to minimize a loss function. An objective function is either a loss function or its negative (sometimes called a reward function, a profit function, a utility function, a fitness function, etc.), in which case it is to be maximized."

On this site we have a good explanation for what AIC is. That is, ASSUMING the residuals are Gaussian, AIC is log-likelihood is given by: "$ \log(L(\theta)) =-\frac{|D|}{2}\log(2*\pi) -\frac{1}{2} \log(|K|) -\frac{1}{2}(x-\mu)^T K^{-1} (x-\mu)$, $K$ being the covariance structure of your model, $|D|$ being the number of points in your datasets, $\mu$ the mean response and obviously $x$ being your dependent variable.

More specifically AIC is calculated to be equal to $2k - 2 \log(L)$, where $k$ is the number of fixed effect in your model and $L$ your likelihood function [1]. It practically compares trade-off between variance($2k$) and bias ($2 \log(L)$) in your modelling assumptions. As such in your case it would compare two different log-likelihood structures when it came to the bias term. That is because when you calculate your log-likelihood practically you look at two terms: A fit term, denoted by $-\frac{1}{2}(x-\mu)^T K^{-1} (x-\mu)$ and a complexity penalization term, denoted by $-\frac{1}{2} \log(|K|)$."

Although the fit term is not called a loss function per se, it is a loss function, because log likelihood is optimized to yield an AIC value, that is, a step left out in the explanation above, and which is probably not fully characterized for general residuals that do not agree with the given Gaussian assumption. That answers your question, yes, it is a loss function. What you didn't ask is whether AIC is good for anything.

edited Apr 13 '17 at 12:44

Community

1

answered Aug 05 '16 at 21:37

Carl

11,532
7
45
102

1

Being a nitpciker or a visionary, i disagree with the Wikipedia statement: 'An optimization problem seeks to minimize a loss function. An objective function is either a loss function or its negative (sometimes called a reward function, a profit function, a utility function, a fitness function, etc.), in which case it is to be maximized." ' While those make sense to determine how good things can be (made), it also can make sense to maximize a loss function to determine how bad things could be (what us the worst outcome). Combining the two bounds the range of outcomes. Similarly for reward. – Mark L. Stone Aug 05 '16 at 21:56
1

Suggest you try to change Wikipedia. They may give you an argument. More important is that loss (or reward) functions can have local minima, be indeterminate, and be totally inappropriate to the physics of "loss" of the problems they model, which is what prompted my snide remark "you didn't ask.." – Carl Aug 05 '16 at 22:41
1

Left unsaid in my comment is for the worst case analysis, you really need the global optimum. If, for instance, the loss function is convex, possibly subject to convex constraints, then it is a (often 'easy") convex optimization problem, and therefore, there are no non-global local minima. Finding the maximum of that convex function subject to the same constraints is a concave optimization problem, for which finding the global maximum can be quite difficult (of course, if the objective function is linear, then both directions are convex optimization problems and concave optimization problems). – Mark L. Stone Aug 05 '16 at 23:22
@MarkL.Stone It gets more complicated in with $n$-tuple independent variables in $n+1$-space, and, to make matters worse, sometimes the global optimum is not physical so that the desired solution is a local optimum for which even finding constraints can be problematic. So, how bad does it get? Blech, yuck, ack, fooey that's how. – Carl Sep 17 '16 at 03:38
Can you provide an example for which the global optimum is not physical so that the desired solution is a local optimum? Is such an optimization model formulation a good one? What I can tell you, is that sometimes the "nearest" local optimum ,whether or not it happens to be a global optimum, might tend to be the best solution - this could happen for instance in a sequential quadratic programming (QP) algorithm in which a QP is solved to find the step to a new point. and a nearby local optimum may be better than a distant global optimum for which the quadratic model may be less valid. – Mark L. Stone Sep 17 '16 at 04:22
Fitting most any two density convolution to a time series has a tendency to go to negative start times.This happens for various loss functions including those that are properly modelled for measurement error. – Carl Sep 17 '16 at 04:28
That's what constraints are for. Is it problematic to "find" and impose non-negativity constraints? Those are "nice" constraints to have, from a computational standpoint. – Mark L. Stone Sep 17 '16 at 04:50
Yes, it can be problematic to impose non-negativity constraints. Trivial example, problems are sometimes convergent to zero from below, even when that implies excursion into the complex field for a short distance. There are lots of other more complicated examples. – Carl Sep 17 '16 at 04:56

Is Akaike Information Criterion a kind of loss function?

1 Answers1