Why does higher variance of independent variables decrease standard errors of the estimator?

Question

Some time ago, reading Advanced Data Analysis from an Elementary Point of View by Cosma Rohilla Shalizi, I found this statement in a section about significant coefficients:

Moreover, at a fixed sample size, the coefficients with smaller standard errors will tend to be the ones whose variables have more variance, and whose variables are less correlated with the other predictors. High input variance and low correlation help us estimate the coefficient precisely, but, again, they have nothing to do with whether the input variable actually influences the response a lot.

While in a simple regression setting, that is clear from the formula of the standard error of the estimator $ \hat{\beta} $:

$${\displaystyle s_{\hat {\beta }}={\sqrt {\frac {{\frac {1}{n-2}}\sum _{i=1}^{n}{\hat {\varepsilon }}_{i}^{\,2}}{\sum _{i=1}^{n}(x_{i}-{\bar {x}})^{2}}}}},$$

moving on a multivariate OLS, I cannot see why the same is true from the corresponding formula of the estimated standard error of (generic) $\hat{\beta_j}$:

$${\displaystyle {\widehat {\operatorname {s.\!e.} }}({\hat {\beta }}_{j})={\sqrt {s^{2}(X^{T}X)_{jj}^{-1}}}}.$$

Making some tests with matrix inversion I found that it is generally true, and that can be maybe seen from the Cramer's rule $\displaystyle A^{-1}={\frac {1}{\operatorname {det} (A)}}\operatorname {adj} (A)$.

Can someone provide me with some insights on this?

Could you explain what you are comparing to what? Exactly how do you propose to increase the variance of the independent variables? — whuber, Jul 28 '18 at 21:38
I am describing an ideal, textbook situation: see for example p. 69 of http://www.stat.cmu.edu/~cshalizi/ADAfaEPoV/ADAfaEPoV.pdf, where there is this statement. — pdb, Jul 28 '18 at 21:49
X'X will be a p$\times$p matrix that has terms that are sums of n components (each a product of two values in the x-matrix). If you imagine each row of the X matrix as having been drawn from a common p-variate distribution, then if you double the number of observations, your X'X will have elements that are on average twice as big. That is X'X will grow proportionally to n and (X'X)$^{-1}$ will correspondingly shrink proportionally to 1/n. — Glen_b, Jul 29 '18 at 03:54
@Glen_b That thought experiment, although useful, assumes the new observations will be identical to the old. In non-experimental applications, that would be an unrealistic scenario. What if the new observations are drawn independently and randomly from a multivariate distribution for which $\beta_j$ is not identifiable? Intuitively, the standard error of $\hat\beta_j$ ought to have a tendency to *increase* as more observations are taken. This suggests the situation may be subtle and that a clear statement is needed of exactly how the variance of the IVs is allowed to increase. — whuber, Jul 29 '18 at 15:53
@whuber no dispute, even without its flaws when considered as anything more than motivation it also doesn't quite speak to the title question. I should probably delete the comment; the point of it was mostly to show that it's like n times a variance, but I think what's needed is an explanation of why {X'X) should be seen as being a multivariate analog of a sum of squares about the mean (perhaps along with some motivation for the relationship between "spread" in x's and uncertainty about where the line goes). — Glen_b, Jul 30 '18 at 00:06
See the related: https://stats.stackexchange.com/questions/114039/why-people-often-optimize-the-determinant-of-x-sigma-x-1/114048#114048 — kjetil b halvorsen, Mar 05 '19 at 21:33
@kjetil b halvorsen sorry, can you explain how it's related? — pdb, Mar 08 '19 at 11:40
@pdb: It's related because D-optimal designs (for regression) tends to choose design points (that is, values for the regressors) at the extremes of the allowed region (the set from which design points are allowed to be takes.) For instance, for a simple linear model $y=\beta_0 + \beta_1 x +\epsilon$ with $x \in [a,b]$, it will take half the points at $a$, the other half at $b$. That is also the choice that maximizes the variance of $x$. — kjetil b halvorsen, Mar 08 '19 at 14:41

Why does higher variance of independent variables decrease standard errors of the estimator?

0 Answers0