3

If we want to do multiple individual (componentwise) regression, (like the one used in Sure-Independent-Screening, Fan & Lv 2007) we have that:

$$\hat\beta_{ind} = \frac{1}{n}X^Ty$$

(assuming normalized $X$)

i.e., $\hat\beta_{j,ind} = $ regression estimate made for a model of $y=\beta_jx_j$

enter image description here

Compare this with the full linear regression (normal equation) which is: $$\hat\beta = (X^TX)^{-1}X^Ty $$

It appears that the full linear estimator is actually some linear transformation (projection even?) of the individual regression coefficients! I.e., $$\hat\beta = P\hat\beta_{ind} $$

I was wondering if there is more insight on the nature of this transformation?

Maverick Meerkat
  • 2,147
  • 14
  • 27
  • Correct me if I'm wrong, but basically, you are calling the result of $X^T y$ multiplication as "multiple individual regression" and ask what does $P = (X^T X)^{-1}$ do in OLS..? Maybe you would find this thread helpful https://stats.stackexchange.com/questions/22501/is-there-an-intuitive-interpretation-of-ata-for-a-data-matrix-a/22520#22520 – Tim May 12 '21 at 09:57
  • @Tim $X^Ty$ is (up to 1/n) a vector of individual regression coefficients. I.e., regressing $y=\beta_j x_j$ only. – Maverick Meerkat May 12 '21 at 10:00
  • What do you mean by "individual regression coefficients" here? Moreover, it is not clear to me what exactly is your question? 1/n is a constant so it doesn't change much about the computations. You probably should add a reference to the "Sure-Independent-Screening", since this is not a standard statistical method. – Tim May 12 '21 at 10:09
  • I added a reference to the original paper, and more explanation. – Maverick Meerkat May 12 '21 at 16:17

1 Answers1

2

Edit: I think my old answer is a bit inaccurate. 1st of all - regarding my question - $(X^TX)^{-1}$ is obviously not a projection matrix. By the mere fact that $P^2 \neq P$.

2nd thing - there's seem to be a bit of confusion because of the standardizing/normalizing stuff. If I regress only 1 covariate w/o intercept, I get $\hat\beta_j=(x^Tx)^{-1}x^Ty$. If I do this to all the covariates separately, I will get what I called the "individual regression", i.e., in matrix form: $$\hat\beta_{ind}=\begin{pmatrix} x_1^Tx_1 & \dots &0 \\ \vdots &\ddots & \vdots \\ 0 & \dots & x_p^Tx_p \end{pmatrix}^{-1}X^Ty$$ I.e., it's as if we are assuming that the covariance between the covariates is 0 = they are uncorrelated. Which in realty, of course, is not true. Compare this to the full regression which doesn't assume this: $$\hat\beta=\begin{pmatrix} x_1^Tx_1 & \dots &x_1^Tx_p \\ \vdots &\ddots & \vdots \\ x_p^Tx_1 & \dots & x_p^Tx_p \end{pmatrix}^{-1}X^T y $$ I'm not sure it's possible to break this down to some matrix time $\hat\beta_{ind}$...

In the case where we standardize the columns of $X$, then this is possible. $\hat\beta_{ind}$ is reduced to $\frac{1}{n}X^Ty$ and $\hat\beta$ can be written as $(\frac{1}{n}X^TX)^{-1}\hat\beta_{ind}$


Old post: So, this is what I think:

  • $X^Ty$ finds the individual regression, if $X$ is normalized. It is also the complete regression in an orthogonal design (i.e., if $X^TX=I$).
  • $(X^TX)^{-1}$ is actually normalizing the $X$'s anyway, i.e., $(X^TX)^{-1}X^T y$ will be normalized. You can see this clearly if you take the columns of $X$ to be orthogonal but not orthonormal. $X^TX$ will be a diagonal matrix but without 1's in the diagonal. Taking the inverse of that, and multiplying that by $X^Ty$ we get again the individual regression.
  • This means that if $X$ has no correlation between the features, and is normalized, than $X^Ty$ reveals the coefficients.
  • If $X$ has features with a positive correlation, then $X(X^TX)^{-1}$ has negative correlation. And vice versa.
  • I would expect that $(X^TX)^{-1}$ also serves to de-correlate the structure of the $X$'s to a new space $X^*=X(X^TX)^{-1}$, and in this new space we use individual regression to recover the coefficients.
    • The thing that bothers me is why isn't ${X^*}^TX^*=I$?
    • Maybe it's a 2-way trip - $(X^TX)^{-1}X^T y$ goes to this new space, preforms individual regression there, and then comes back. Perhaps using SVD-decomposition we can see this? $$X = UDV'\Rightarrow (X^TX)^{-1}X^T y = V(D)^{-1}U'y $$ where $U'y$ is the individual regression for the $U$'s, $D^{-1}$ is the normalization, and $V$ is the projection back?
    • It is true that if you regress $U$ to $y$ you get individual regression = regular regression, which is not so surprising given that the columns of $U$ are orthonormal.

So in the end the difference between component wise regression, $\hat\beta_{ind} = VDU'y$ and normal regression $\hat\beta = VD^{-1}U'y$ - is the that the $D$ is inverted.

Maverick Meerkat
  • 2,147
  • 14
  • 27