I've been working on understanding how R calculates the standard error it reports in the summary()
of a regression generated using lm()
like:
x <- c(1:20)
y1 <- 3 * x + rnorm(length(x), sd = 0.5)
r1 <- lm(y1 ~ x)
s1 <- summary(r1)
s1$coefficients
and came across the s1$cov.unscaled
matrix, which is calculated from the design matrix of the regression like:
X <- matrix(c(rep(1, length(x)), x), ncol = 2)
M <- solve(t(X) %*% X)
where M == s1$cov.unscaled
. Since it only depends on x
, M
is also the same as s2$cov.unscaled
in:
y2 <- 3 * x + rnorm(length(x), sd = 5)
r2 <- lm(y2 ~ x)
s2 <- summary(r2)
where the residual error of the regression is much higher. Together with the residual standard error s1$sigma
, M
can then be used to calculate the standard error of the parameter estimates as reported in s1$coefficients
:
sqrt(diag(M) * s1$sigma^2)
The meaning of the standard error of a parameter estimate is quite intuitive, and it also makes sense that an estimate of how well the regression fits the data (like s1$sigma
) would be included in its calculation (i.e. $\widehat{\sigma}^2 (X^{\top}X)^{-1}$). I am having trouble however in understanding intuitively the meaning of s1$cov.unscaled
(or more generally $(X^{\top}X)^{-1}$), which only depends on x
, and not on y1
or y2
. Is there an intuitive meaning of this construct?
I'd be very grateful for any clarifications or reading suggestions on this matter!