L-curve method for regularization parameter selection

Question

I work on PDE inverse problems and I'm interested in how these can be viewed as problems of statistical inference. I'm looking for some model parameters $m$ which minimize the misfit with some data $d$ subject to some physics $G$:

$J[m] = \frac{1}{2}\|G(m) - d\|_X^2$

The operator $G$ is the inverse of a nonlinear elliptic partial differential operator, so in other words, a mess. This inverse problem is horribly ill-posed, so one imposes some amount of regularization:

$\bar J[m] = \frac{1}{2}\|G(m) - d\|_X^2 + \lambda\|m\|_Y^2$

I understand that this deterministic PDE inverse problem can be viewed as finding the model $m$ with maximum a posteriori probability, and that the imposition of regularization is viewed as having prior information about the solution $m$ of the inverse problem.

What I'm confused about is the statistical viewpoint on selecting the regularization parameter. In the literature on PDE inverse problems, nearly everyone uses the L-curve method: to find a balance between goodness-of-fit and simplicity, plot the curve

$\{\log\|G(m_\lambda) - d\|_X, \log\|m_\lambda\|_Y\}$;

the curve will (almost always) look like a capital letter "L", so take $\lambda$ to be the corner, i.e. the point of maximum curvature. However, I can hardly find this method mentioned anywhere in the stats literature. Instead, using cross-validation or an information criterion (Bayes, Akaike, etc.) seems much more common.

Is there a statistical meaning to selecting $\lambda$ using the L-curve? It would be especially satisfying if I could say that the L-curve is the same as using something like the Akaike information criterion. I can kind of intuitively justify this to myself, but can't quite work out the details.

score 2 · Accepted Answer · answered Jan 18 '16 at 01:53

The L-curve technique for selecting the regularization parameter is a heuristic technique. It isn't related to AIC, BIC, etc. It can be shown that if your problem is linear ($G(m)=Gm$) and the likelihood is multivariate normal, then for each $\lambda$, there is a multivariate normal prior which leads to a posterior distribution with mean equal to the regularized least squares solution.

L-curve method for regularization parameter selection

1 Answers1