Orthogonalizing predictors for least squares estimation

Question

I know that orthogonalization in LS is to avoid inverting X'X. The idea behind it is to find variables Z that are orthogonal to each other. Although the process to find those is clear to me, I don't get the way to find the coefficients. The algorithm leaves $Z_1=X_1$ (I leave the first transformed variable like the original), and then calculates the coefficients in the new space of $z,\ \alpha_1=\frac{<Y,Z_1>}{{||Z_i||}^2}$ (in point 4 in the pseudo-code). Next, $Z_i=Z_i-\frac{<Z_j,Z_i>}{{||Z_i||}^2}$ (point 5 in the pseudo code). That is OK. I understand that.

The thing is that to get the $\beta$ (coefficients in the original space), (point 10 in the pseudo code):

$\beta_m=\alpha_m-\frac{\sum<\beta\cdot X,Z>}{{||Z_m||}^2}$.

And it starts with the last $Z$ , so when m=M, the $\beta_m=\alpha_m$ (because $i=m+1$ and the sum ends at $m$).

Finally when it reaches $\beta_1$, it's different from $\alpha_1$. But we were doing $Z_1=X_1$, so I don't understand why $\beta_1\ne\alpha_1$.

Can anyone give me an idea of why? Thanks! The algorithm would be (in Spanish but you can realize what it's doing):

Can you say more about the process you are asking about, or provide a reference regarding this estimator? I've never heard of it. — gung - Reinstate Monica, Jul 21 '14 at 00:40
You have the pseudo code, the question, the reasoning. What else do you need? — GabyLP, Jul 21 '14 at 13:08
Gaby, your question is hard to follow but I will take a chance. With gran smith you are getting $z_1, \ldots, z_n$ an orto-normal base of the space spanned by the columns of $x$. Then what you need to know is that $\beta_j = Pr_{}(y)$. That is you get the value by proyecting over the elements of the base. Saying that you should put more effort in possing your question. People here are very helpful, but you need to do your part. — Manuel, Jul 21 '14 at 14:46
ok Manuel, thanks for your effort. I added some clarifications, and linked to the part of the code. What I don't understand is the intuition of the last part (obtaining original coefficients) and why a1, corresponding to x1=z1 is not b1. — GabyLP, Jul 21 '14 at 15:48
There are some differences between your writing and your image that I've left unchanged within the $\LaTeX$. You may want to compare them and fix them yourself if they're important. E.g., $\beta_m-1=\alpha_m-1-\frac{\sum}{{\rm norm}(Z_m-1)^2}$ instead of $\beta_{m-1}=\alpha_{m-1}-\frac{}{||Z_{m-1}||^2}$. You can right-click these to see the $\TeX$ commands if you like ("Show Math As"). — Nick Stauner, Jul 21 '14 at 19:00
I still cannot fathom what you are asking. Although I can read and understand the algorithm--it appears to be a partial Gram-Schmidt orthogonalization run in parallel with a least squares fit--it makes only the vaguest references to "sufficiently good variables" and I cannot understand what you mean by "doing $Z_1=X_1$" or by the question about $\alpha_1$ and $\beta_1$. Perhaps you could show us a worked example ($r=2$ dimensions ought to suffice) to illustrate? I suspect it is identical to the "matching" procedure for multiple regression I describe at http://stats.stackexchange.com/a/46508. — whuber, Jul 22 '14 at 16:28

Glen_b · Answer 1 · 2014-07-21T00:48:40.513

(While not presently an answer, this is too long for a comment; I plan to either edit it into an answer when the question improves, or eventually to delete it)

You have several terminology problems that render this question almost unanswerable, and certainly confusing for many readers

1) "OLS" is almost universally understood to mean 'ordinary least squares', not 'orthogonal least squares'.

2) Orthogonal least squares, as conventially understood is not what you seem to be talking about.

You seem to be instead talking about linear transformations to orthogonalize the predictors, perhaps something like a Gram-Schmidt algorithm.

What are you doing? What is the algorithm you're trying to describe? What language is your code supposed to be in? Please clarify your question via edits, taking care to be as clear as you can.

1) that's why I wrote "orthogonal". 2) yes, gram-schmidt. 3) I added the pseudo code. thanks — GabyLP, Jul 21 '14 at 03:45

Orthogonalizing predictors for least squares estimation

1 Answers1