I'm afraid the answer is embarrassingly obvious, but here it goes... I was playing with R trying to get "giant" (Prof. Strang's word when explaining penalized regression) inverses of $A^\top A$ (/a-transpose-a/, Gram model matrices) in the presence of highly co-linear regressors. I remember the relationship of the inverse of $A^\top A$ to the variance of the parameter estimates - a direct relationship $\text{Var} (\hat \beta) = \sigma^2 \left(A^\top A \right)^{-1},$ indicating that the high variance of the estimates in the presence of collinearity is related to high values in the inverse of the $A^\top A$ matrix. Of course this is addressed on the site:
If two or more columns of $A$ are highly correlated, one or more eigenvalue(s) of $A^\top A$ is close to zero and one or more eigenvalue(s) of $(A^\top A)^{−1}$ is very large.
Yet, to my surprise, it was $A^\top A,$ and not $(A^\top A)^{-1},$ the matrix with huge eigenvalues.
The toy model is trying to predict the yearly income based on paid income taxes and weekend expenses, and all variables are highly correlated:
$$\text{income} \sim \text{income taxes} + \text{money spent on weekends}$$
# The manufacturing of the toy dataset with 100 entries
weekend_expend = runif(100, 100, 2000)
income = weekend_expend * 100 + runif(100, 10000, 20000)
taxes = 0.4 * income + runif(100, 10000, 20000)
df = cbind(income, taxes, weekend_expend)
pairs(df)
> summary(mod <- lm(income ~ weekend_expend + taxes))
Call:
lm(formula = income ~ weekend_expend + taxes)
Residuals:
Min 1Q Median 3Q Max
-5337.7 -1885.9 165.8 2028.1 5474.6
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 5260.14790 1656.95983 3.175 0.00201 **
weekend_expend 81.55490 3.07497 26.522 < 0.0000000000000002 ***
taxes 0.46616 0.07543 6.180 0.0000000151 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 2505 on 97 degrees of freedom
Multiple R-squared: 0.9981, Adjusted R-squared: 0.9981
F-statistic: 2.551e+04 on 2 and 97 DF, p-value: < 0.00000000000000022
> # The model matrix is of the form...
> head(A <- model.matrix(mod))
(Intercept) weekend_expend taxes
1 1 1803.8237 92743.93
2 1 441.6305 33697.32
3 1 379.0888 36401.24
4 1 1129.1074 65869.23
5 1 558.3715 36708.88
6 1 1790.5604 92750.60
>
> And the A transpose A is...
> (A_tr_A <- t(A) %*% A)
(Intercept) weekend_expend taxes
(Intercept) 100.0 113189.2 6632490
weekend_expend 113189.2 159871091.4 8788158840
taxes 6632489.5 8788158839.9 492672410430
>
> ... with its inverse...
> (inv_A_tr_A <- solve(A_tr_A))
(Intercept) weekend_expend taxes
(Intercept) 0.43758617285 0.00072025324389 -0.0000187385886210
weekend_expend 0.00072025324 0.00000150703080 -0.0000000365782573
taxes -0.00001873859 -0.00000003657826 0.0000000009067669
>
> The eigenvalues of the A transpose A are...
> eigen(A_tr_A)$values
[1] 492829172338.305359 3109280.897155 2.285258
>
> "Huge" as compared to the eigenvalues of its transposed...
> eigen(inv_A_tr_A)$values
[1] 0.437587359169068602 0.000000321617773712 0.000000000002029101
The maximum eigenvalue of $A^\top A$ is $492829172338$ while for $(A^\top A)^{-1}$ we get eigenvalues as low as $0.000000000002029101.$
I was expecting the opposite to be the case: Much higher eigenvalues for the inverse of $A^\top A.$ So is this result spurious, or am I missing something critical?