QR factorization and linear regression

Question

I have been reading "Generalized Additive Models an Introduction with R" by Simon Wood and have come across a section I'm having trouble with. On page 13 it is stated that the model or design matrix

$X$ can always be decomposed into $$X = Q \left[ \begin{matrix}{R}\\{0}\end{matrix}\right] = Q_fR$$

where $R$ is a $p \times p$ upper triangular matrix and $Q$ is an $ n \times n $ orthogonal matrix, the first p columns of which form $Q_f$. Recall that orthogonal matrices rotate vectors, but do not change their length. Orthogonality also means that $QQ^t = Q^tQ=I_n$. Applying $Q^t$ to $y-X\beta$ implies that $$\left\|y-X\beta\|^2=\right\|Q^ty-Q^tX\beta\|^2=\left\|Q^ty-\left[\begin{matrix}R\\0\end{matrix}\right]\beta\right\|^2 $$

Writing $Q^ty=\left[\begin{matrix}f\\r\end{matrix}\right]$, where $f$ is vector of dimension $p$, and hence $r$ is a vector of dimension $n − p$, yields

$$ \|y-X\beta\|^2=\left\|\left[\begin{matrix}f\\r\end{matrix}\right] - \left[\begin{matrix}R\\0\end{matrix}\right]\beta \right\|^2=\|f-R\beta\|^2+\|r\|^2 $$

The length of $r$ does not depend on $\beta$ , while $\|f − R\beta\|^2$ can be reduced to zero by choosing $\beta$ so that $R$ equals $f$ . Hence $$ \hat\beta=R^{-1}f $$ is the least squares estimator of $\beta$. Notice that $\|r\|^2=\|y-X\hat\beta\|^2$, the residual sum of squares of the model fit.

I find this a bit confusing for a number of reasons. First, it seems to me that $Q$ will only be $n\times n$ if X is $n\times n$. However in general $Q$ will be $n\times p$ where the number of parameters $p$ is less than the number of subjects/cases $n$. It is also stated that the first $p$ columns of $Q$ make up $Q_f$. In order to multiply $Q$ and $R$, $Q$ must contain $p$ columns. Second, if we had a simple design matrix of dimension 2x4 (i.e. we have an intercept and slope paramater and 4 subjects) y would be 4x1 and $Q^t$ 2x4. $Q^ty$ will be 2x1. Thus it is not clear to me how $Q^ty =\left[\begin{matrix}f\\r\end{matrix}\right]$ where $f$ is vector of dimension $p$, and hence $r$ is a vector of dimension $n − p$. In this case $f = 2$ and $n-p=2$ which would give $Q^ty$ a dimension of 4x1.

I have checked the books errata and there are no mistakes listed for this section, so I must not be thinking about this quite right. If anyone could shed some light on what piece of this puzzle I'm missing, I'd greatly appreciate some enlightenment.

You're mistaken. $Q$ *is* $n\times n$. However, $QR$ is of the same dimension as $X$, $n\times p$. The $p \times p$ dimension given for $R$ in the text is what is called 'little R'; the full dimension of $R$ is $n\times p$ but the lower part is zero. — Glen_b, Jan 31 '14 at 22:03
It looks like there are two versions of the QR factorization. I was taught The "skinny" version where A is mxn,Q is mxn and R is nxn. See http://www4.ncsu.edu/eos/users/w/white/www/white/ma580/chap3.3.PDF — Ian, Feb 01 '14 at 01:04
I too struggled with reconciling the usual definitions of QR decomposition and the resulting matrices with this presentation. I finally stumbled upon Wood's approach in the documentation for his R package gamair. Look for "Q.11" under "ch1.solutions" in the linked [pdf](https://cran.r-project.org/web/packages/gamair/gamair.pdf). — M. Todd, Dec 05 '17 at 17:55

score 6 · Answer 1 · answered Jan 31 '14 at 17:00

First, rewriting the $QR$ decomposition in a (hopefully) clearer way by showing the partitioning of $Q$ and explicitly giving the dimensions of the $0$ matrix:

$$ X=QR=[Q_f,Q_g]\left[\begin{array}{c}R_1\\0_{(n-p)\times p}\end{array}\right]=Q_fR_1 $$

So I think part of the confusion here is that in the book you are referring to, $R$ is not the $R$ of the $QR$ decomposition but actually only the first $p$ rows, what I have denoted $R_1$.

In your example, $X$ should be $4 \times 2$, not $2 \times 4$ as you've stated. So $n=4$, $p=2$. Then, yes, $y$ is $4 \times 1$, but $Q$ is $n\times n$ and so $4 \times 4$ as is $Q^{t}$. Then $Q^{t}y$ is $4 \times 1$ which is consistent with $f$ and $r$.

I see my typo, thanks for pointing it out. If $R$ is p x p, than the first p rows of $R$ is all of $R$, right? But it looks like what you are saying is that the $Q$ above, is an augmented matrix of $Q_f$ joined with $Q_f$ and $R$ essentially has just has more zeros beneath the upper triangular $R$ normally found in the QR decomposition. Am I understanding you? — Ian, Jan 31 '14 at 22:47

QR factorization and linear regression

1 Answers1