For the solution of $Ax = b$, where $A$ is a square matrix, what is the difference between these two regularized solutions:
- $x = (A + \alpha I)^{-1}b$ -- coressponding to eq.3 below
- $x = (A^TA + \alpha I)^{-1}A^Tb$ -- corresponding to eq.2 below with $K(w) = A, K(-w) = A^T$
In the following reference:
https://www.sciencedirect.com/science/article/abs/pii/0041555382900945#
for positive symmetric difference kernels (for example a Gaussian kernel) the first form of regularization is promoted as a simplified Tikhonov regularization (in statistical models we usually use $p = 0$ for $M(w) = I$).
Is the simplification just an approximation or is it equivalent to the second form above? Basically I am trying to understand the connection between discretized versions that we use in statistics and the continuous versions that are used in integral equations.
This question is related to this
Regularized linear vs. RKHS-regression
where the answer claims a substantial difference, in contrast to the above recommendation.