6

For a linear regression $Y = X\beta + \varepsilon$ with $\varepsilon \sim \mathcal N(0,\sigma^2 I)$, we have $\hat Y = H Y$ for $H = X(X^TX)^{-1}X^T$. This means that $Var(Y - \hat Y) = \sigma^2(I-H)$ so in particular $Var(Y_i - \hat Y_i) = \sigma^2(1-h_i)$.

Suppose my predictor matrix $X$ has rows $x_1,\dots,x_n\in\mathbb R^p$. I want to bound this residual variance, and therefore $1-h_i$, in terms of $\|x_i - \frac 1n\mathbf 1^TX\|^2$, the distance of $x_i$ to the mean of the $x_i$. Is there something nice I can do here?

If this is a simple linear regression I know $$ 1 - h_i = 1 - \frac 1n - \frac{(x_i - \bar x)^2}{\sum_j (x_j - \bar x)^2} $$ so I am handed such a bound via the appearance of the $(x_i - \bar x)^2$ term. But what about in a multiple regression?

rapaio
  • 6,394
  • 25
  • 45
alfalfa
  • 581
  • 3
  • 13

1 Answers1

1

I was able to come up with a bound although it's not very tight.

Let $X = UDV^T$ be the SVD of $X$, so that $H = UU^T$. Let $u_1,\dots,u_n \in \mathbb R^p$ be the rows of $U$ (as column vectors) which means that $h_i = \|u_i\|^2$.

Let $s_i^2 = \|x_i - \frac 1n X^T\mathbf 1\|^2$. If $e_i$ is the $i$th standard basis vector then $x_i = X^Te_i$ so I can write $$ s_i^2 = \|X^Te_i - \frac 1n X^T\mathbf 1\|^2 = \|X^T(e_i - \frac 1n\mathbf 1)\|^2 \\ = (e_i - \frac 1n\mathbf 1)^TXX^T(e_i - \frac 1n\mathbf 1) \\ = (e_i - \frac 1n\mathbf 1)^TUD^2U^T(e_i - \frac 1n\mathbf 1) . $$ I don't like the $D^2$ since it seems like it'll be helpful to get a quadratic form with $H=UU^T$ instead of $XX^T$ so I'll use the fact that $d_1^2$ is the largest squared singular value so $$ s_i^2 \leq d_1^2(e_i - \frac 1n\mathbf 1)^TUU^T(e_i - \frac 1n\mathbf 1) \\ = d_1^2 \left(e_i^TUU^Te_i - \frac 2n \mathbf 1^TUU^Te_i + \frac 1{n^2}\mathbf 1^TUU^T\mathbf 1\right) \\ = d_1^2 \left(h_i - \frac 2n \mathbf 1^TUU^Te_i + \frac 1{n^2}\mathbf 1^TUU^T\mathbf 1\right). $$ Now I'll assume that $\mathbf 1$ is in the column space of $X$ (i.e. it has an intercept, which shouldn't be too controversial). This means that $H\mathbf 1 = UU^T\mathbf 1 = \mathbf 1$ so $$ s_i^2 \leq d_1^2 \left(h_i - \frac 2n \mathbf 1^Te_i + \frac 1{n^2}\mathbf 1^T\mathbf 1\right) \\ = d_1^2 \left(h_i - \frac 1n \right) $$ which implies that $$ h_i \geq \frac{s_i^2}{d_1^2} + \frac 1n \\ \iff 1 - h_i \leq 1 - \frac 1n - \frac{s_i^2}{d_1^2}. $$ Thus as $s_i^2$ increases this upper bound on the variance of a residual does go up, which is nice to see, but I'm not particularly happy with this as $d_1^2$ is often massive in the experiments I've done so this is close to just saying $1 - h_i \leq 1 - \frac 1n$.

Is there a better bound available?

alfalfa
  • 581
  • 3
  • 13
  • @Hans this is in the context of linear regression so $X$ is assumed to be full rank. $H = X(X^TX)^{-1}X^T$ wouldn't even be defined otherwise as $X^TX$ would be singular – alfalfa Apr 26 '19 at 19:35
  • $X$ is usually not full rank, so neither is $H$ and thus nor $U$. Therefore usually $UU^T\mathbf 1\ne \mathbf 1$. Since $UU^T\mathbf 1$ is an orthonormal projection of $\mathbf 1$ onto $U$, usually $\|UU^T\mathbf 1\|< \|\mathbf 1\|$. – Hans Apr 26 '19 at 19:36
  • I see the confusion. I should have said $U$ is of the same size as that of $X$ which is $n\times m$ with $n>m$. So $UU^T\ne I$. My conclusion that $\|UU^T\mathbf 1\| – Hans Apr 26 '19 at 19:47
  • @Hans I don't understand why you're saying that $X$ is usually not full rank. A unique $\hat\beta$ wouldn't even exist if that were so. In my question I explicitly refer to $(X^TX)^{-1}$ which doesn't exist if $X$ is low rank. And if there's an intercept then $\mathbf 1$ is in the column space of $X$ so $H\mathbf 1 = \mathbf 1$. I just tried this example in R and it shows $UU^T\mathbf 1 = \mathbf 1$: `x – alfalfa Apr 26 '19 at 19:50
  • I apologize for mixing up the nomenclature. Consider the case $X$ is of size $n\times 1$. What is the size of $U$? Do you agree it should also be of size $n\times 1$? – Hans Apr 26 '19 at 19:54
  • @Hans yeah if $X$ is $n\times p$ then $U$ will be $n\times p$ as well although if $X$ has rank $r < p$ I'll only need the first $r$ columns of $U$ to form a basis for the column space of $X$. And if $X$ is full rank $UU^T$ will be rank $p$ – alfalfa Apr 26 '19 at 19:56
  • Let us [continue this discussion in chat](https://chat.stackexchange.com/rooms/92917/discussion-between-hans-and-alfalfa). – Hans Apr 26 '19 at 19:58
  • I made a small suggestion in our chat. To clarify, I suggest applying the orthogonal projection $I-\frac1n\mathbf 1\mathbf 1^T$ on the linear regression equation on its left. – Hans Apr 26 '19 at 22:32
  • So, what is your result? – Hans Apr 27 '19 at 18:09