If one aims to maximise the likelihood of a quantity modelled with a Gaussian with respect to another quanity, an expression as follows may be maximised(Gaussian Log Likelihood):
$P(f(\mathbf{x}_{i}) | f(\mathbf{x}_{j}), \mathbf{\Sigma}) = - \frac{N}{2} - \frac{1}{2} \sum_{i = 1}^{N}(f(\mathbf{x}_{i}) - f(\mathbf{x}_{j}))^{T} \mathbf{\Sigma}^{-1}(f(\mathbf{x}_{i}) - f(\mathbf{x}_{j}))$
where $f(.)$ is some scalar valued function of vectors.
To maximise this expression within the Levenberg-Marquardt Nonlinear Least Squares framework, one computes the analytical gradients
$\frac{\partial P}{\partial \mathbf{x}_{i}} = \frac{\partial P}{\partial f} \frac{\partial f}{ \partial \mathbf{x}_{i}}$
and uses them in the manner described in the wikipedia article.
Now, suppose one wishes to place a gaussian prior over $\mathbf{x}_{i}$ such that
$P(\mathbf{x}_{i}) = \mathcal{N}(\mathbf{x}_{i} | \mathbf{0}, \mathbf{\lambda}^{-1})$
I understand that this is equivelant to adding an L2 norm over $\mathbf{x}_{i}$, scaled by $\mathbf{\lambda}$ to the above Gaussian Log Likelihood, as per this question.
However, what it not clear to me is how this will fit in with the LM framework that is designed to work with such Least Squares equations.
Does one simply differentiate and optimise w.r.t $\mathbf{\lambda}$ in addition to $\mathbf{x}_{i}$?