Difference Between Two Tikhonov Regularization Schemes

Question

For the solution of $Ax = b$, where $A$ is a square matrix, what is the difference between these two regularized solutions:

$x = (A + \alpha I)^{-1}b$ -- coressponding to eq.3 below
$x = (A^TA + \alpha I)^{-1}A^Tb$ -- corresponding to eq.2 below with $K(w) = A, K(-w) = A^T$

In the following reference:

https://www.sciencedirect.com/science/article/abs/pii/0041555382900945#

for positive symmetric difference kernels (for example a Gaussian kernel) the first form of regularization is promoted as a simplified Tikhonov regularization (in statistical models we usually use $p = 0$ for $M(w) = I$).

Is the simplification just an approximation or is it equivalent to the second form above? Basically I am trying to understand the connection between discretized versions that we use in statistics and the continuous versions that are used in integral equations.

This question is related to this

Regularized linear vs. RKHS-regression

where the answer claims a substantial difference, in contrast to the above recommendation.

If you assume nothing about $A,$ there can be profound differences. Compare the solutions when, for instance, $A$ is the zero matrix. — whuber, Nov 15 '18 at 20:00
The second equation is standard for ridge regression and $A$ need not be square. The first is the standard for kernel ridge regression (and SVM/gaussian process, kernel methods with ridging) where $A$ is a square similarity matrix of the cases in the data. — Heteroskedastic Jim, Nov 16 '18 at 04:00

Difference Between Two Tikhonov Regularization Schemes

0 Answers0