I work on PDE inverse problems and I'm interested in how these can be viewed as problems of statistical inference. I'm looking for some model parameters $m$ which minimize the misfit with some data $d$ subject to some physics $G$:
$J[m] = \frac{1}{2}\|G(m) - d\|_X^2$
The operator $G$ is the inverse of a nonlinear elliptic partial differential operator, so in other words, a mess. This inverse problem is horribly ill-posed, so one imposes some amount of regularization:
$\bar J[m] = \frac{1}{2}\|G(m) - d\|_X^2 + \lambda\|m\|_Y^2$
I understand that this deterministic PDE inverse problem can be viewed as finding the model $m$ with maximum a posteriori probability, and that the imposition of regularization is viewed as having prior information about the solution $m$ of the inverse problem.
What I'm confused about is the statistical viewpoint on selecting the regularization parameter. In the literature on PDE inverse problems, nearly everyone uses the L-curve method: to find a balance between goodness-of-fit and simplicity, plot the curve
$\{\log\|G(m_\lambda) - d\|_X, \log\|m_\lambda\|_Y\}$;
the curve will (almost always) look like a capital letter "L", so take $\lambda$ to be the corner, i.e. the point of maximum curvature. However, I can hardly find this method mentioned anywhere in the stats literature. Instead, using cross-validation or an information criterion (Bayes, Akaike, etc.) seems much more common.
Is there a statistical meaning to selecting $\lambda$ using the L-curve? It would be especially satisfying if I could say that the L-curve is the same as using something like the Akaike information criterion. I can kind of intuitively justify this to myself, but can't quite work out the details.