Leverages and effect of leverage points

Question

I just got some question about the hat matrix in linear models.

My first question is: Why in a balanced one-way layout $(n_1=...=n_c=n_0)$, all leverages $h_{ii}$ have the same value $\frac{1}{n_0}$? I know that $h_{ii}$ is the $(i,i)$ entry in the hat matrix $H=X(X^TX)^{-1}X^T$. But I couldn't see any relationship between this expression and the result.

My second question is:

when discussing the leverage points in a general linear model, we know that $$0\leq h_{ii}\leq1$$ $$\sum_{i=1}^n h_{ii}=p$$ where $p$ is the number of parameters to be estimated, and $$Var(\hat {Y_i})=\sigma^2h_{ii}=\frac{\sigma^2}{\frac{1}{h_{ii}}}$$ Then it says $\frac{1}{h_{ii}}$ is roughly the number of observations needed to estimate $\hat{Y_i}$. Why is that?

My third question is:

Continuing with my second question, it then says, if $h_{ii}$ is very close to 1, then variance of the $i$th residual very lose to zero (already know that $Var(E_i)=\sigma^2(1-h_{ii})$)
so $Y_i-\sum_{j=1}^p\hat{\beta_j}x_{ij}\simeq0$. Isn't this expression for the $i$th residual? Why we can conclude this from 0 variance of residual? I mean, can't $E_i$ be constant so that the variance is also zero?

My last question is:

Continuing with the above, it then conclude that $\hat{Y_i}\simeq Y_i$ and hence almost one degree of freedom needs to be used to just fit this one observation. Could anyone explain this result to me? Isn't $\hat{Y_i}=Y_i$ exactly what we want? Since I think it means we estimate the $i$th observation perfectly.

There is probably some close duplicate for this ... but I cannot find it. Seems we need some really good "summary post" with a canonical answer for properties of leverage in linear models ... — kjetil b halvorsen, May 15 '16 at 19:27

kjetil b halvorsen · Answer 1 · 2018-10-14T11:00:11.993

We just need to calculate the hat matrix. Write the model for the oneway layout in the form $Y_{ij}= \alpha_j +\epsilon_{ij}$ with one parameter for each group (and no explicit intercept). That will make the calculations simpler (and the hat matrix will not depend on the parametrization chosen), $i=1,2,\dotsc,p, \quad, j=1, \dotsc,n_i$, the total number of observations $n=\sum_i n_i$. Then the design matrix $X$ has the form $$ X =\begin{pmatrix} 1 & 0 & \dots & 0 \\ \dots \\ 1 & 0 & \dots & 0 \\ 0 & 1 & 0 & \dots \\ \vdots \\ 0 & 0 & \dots & 1 \end{pmatrix} $$ where the number of 1's in group $l$ is $n_l$. Then it is easy to calculate that $X^T X = \text{diag}( n_1, \dotsc, n_p )$ and its inverse is $\text{diag}( n_1^{-1}, \dotsc, n_p^{-1} )$. Finally, $$ H = X (X^T X)^{-1}X^T = (h_{ij}) $$ where we calculate $$ h_{ij}= \sum_{s,l} X_{i,s} (X^T X)^{-1}_{sl} (X^T)_{lj} = \\ \sum_{s,l} x_{is} (X^T X)^{-1}_{sl} x_{jl} = \\ \sum_s x_{is} n_s^{-1} x_{js} = \\ \begin{cases} n_l^{-1} &~\text{if observations $i,j$ has same treatment $l$} \\ 0 &~\text{in other cases} \end{cases} $$ Then $H$ is a block matrix with diagonal blocks of size $n_l\times n_l$, with all elements equal to $n_l^{-1}$.

Then you can easily check the properties you have given in "second question". For the last one, $\hat{Y}=H Y$ so that $\text{var}(HY)=\sigma^2 H$, using that $H$ is (symmetric and ) idempotent. Then as you have given, $$ Var(\hat {Y_i})=\sigma^2h_{ii}=\frac{\sigma^2}{\frac{1}{h_{ii}}} $$ and the conclusion Then it says $1/h_{ii}$ is roughly the number of observations needed to estimate $\hat{Y_i}$. (after replacing needed with used) follows by noting that a mean based on $n$ observations (independent) has variance $\sigma^2/n$, and here identifying $n$ with $1/h_{ii}$.

Finally, your third question: The residual $r_i = Y_i - \hat{Y_i}$ with variance $(1-h_{ii})\sigma^2$. If $h_{ii}=1$, then the variance becomes zero. That means that $\hat{Y_i}=Y_i$ with certainty. That would maybe be good if it was believable, but it is too good to be true: This is not really a perfect prediction based on the other observations, it is just a copy of the observation into its own prediction. Use the form of hat matrix $H$ we calculated above: $h_{ii}=1/n_l$ where $i$ belongs to group $l$. So $h_{ii}=1$ really means that $n_l=1$. Then you can check that in $$ \hat{Y_i} = (HY)_{ii} = \sum_{j=1}^n h_{ij} Y_j = h_{ii} Y_i $$ since from the block diagonal form of $H$ you can see that $h_{ij}=0$ for $j\not = i$. So the perfect prediction (and residual 0) is a chimera.

@badmax: https://www.merriam-webster.com/dictionary/chimera In this context: To good to be true/meaningful — kjetil b halvorsen, Dec 21 '21 at 00:11

Leverages and effect of leverage points

1 Answers1

Linked

Related