6

In linear least squares the parameter estimates are: $\hat{\beta} = \left(X^{\top}X\right)^{-1}X^{\top}y$. In Ridge regression the standardized parameter estimates are given by $\hat{\beta}_{\Gamma} = \left(X^{\top}X + \Gamma\right)^{-1}X^{\top}y$.

Variance covariance estimators use leverage values to account for bias in the variance covariance estimates. For linear least squares a few examples include HC2 and HC3 which have effective residuals as: $\tilde{u} = \frac{\hat{u}}{1 - h_{ii}}$ and $\tilde{u} = \frac{\hat{u}}{\left(1 - h_{ii}\right)^{2}}$. $h_{ii}$ are the diagonal values of the projection matrix of the model matrix $h = X\left(X^{\top}X\right)^{-1}X^{\top}$.

My question is: for Ridge Regression, would the leverage adjusted residuals use $h = X\left(X^{\top}X\right)^{-1}X^{\top}$ or do they use a different for due to the fact that

$\hat{y}_{OLS} = X\left(X^{\top}X\right)^{-1}X^{\top}y \ne \left(X^{\top}X + \Gamma\right)^{-1}X^{\top}y = \hat{y}_{\Gamma}$? If different what is the form? How does it generalizes to clusters?

  • Ridge regression can be calculated via ordinary least squares extended with some quasi-data, representing ridge via a bayesian prior. That gives an extended $X$ matrix (extended with one new row per parameter). Just use the usual formula with this extended $X$ matrix? – kjetil b halvorsen Nov 26 '17 at 20:23
  • For cluster robust versions would each "meat" include the added parameters as being in the cluster? – José Bayoán Santiago Calderón Nov 26 '17 at 20:39
  • There are some published papers: http://www.tandfonline.com/doi/abs/10.1080/00401706.1988.10488370, http://www.sciencedirect.com/science/article/pii/S0167947399000195 – kjetil b halvorsen Nov 29 '17 at 16:05
  • 1
    According to this article, the fix is just to use the Ridge hat matrix rather than the projection of X. For weighted leverage values à la McCaffrey and Bell (2006) it might use the cluster specific sub matrices. @kjetilbhalvorsendoes the extended data approach yield the same leverage values? https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-12-372#Sec1 – José Bayoán Santiago Calderón Dec 13 '17 at 04:04

1 Answers1

7

Ridge regression can be calculated via ordinary least squares (OLS) calculated with the data matrix $X$ extended with some surrogate data, taken as corresponding to the surrogate observations $Y_0=0$. Write the model, extended with the surrogate data, as $$ \begin{pmatrix} Y \\ Y_0=0\end{pmatrix} = \begin{pmatrix} X\beta \\ X_0 \beta \end{pmatrix} + \begin{pmatrix} \epsilon \\ \epsilon_0 \end{pmatrix} $$ Using this surrogate formulation, we can calculate the USUAL OLS estimator as $$ ( X^TX + X_0^T X_0)^{-1} (X^T Y + X_0^T 0) = (X^T X + X_0^T X_0)^{-1}X^TY $$ and comparing that with your expression for the ridge estimator, shows that you need to solve $$ \Gamma = X_0^T X_0 $$ for $X_0$, any matrix square root will do, for instance the Cholesky decomposition.

Then you can use the usual formulas for leverage with the extended data matrix $$ \begin{pmatrix} X \\ X_0 \end{pmatrix} $$ which is an $(n+p)\times p$-matrix, so the $n$ first leverage values correspond to the data. As a bonus you get, from the $p$ last leverage values, leverage information on the surrogate data, so you get information on how much influence there is from the surrogate data.

You ask also about cluster robust versions. I don't know about those, but guess the same approach can be used.

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
  • 1
    Obvious +1. The ridge regression by data augmentation is one my favourite procedures! It is worth noting that linear mixed effect regression models can be formulated in a similar manner. :D – usεr11852 Nov 30 '17 at 21:19