I have been reading "Generalized Additive Models an Introduction with R" by Simon Wood and have come across a section I'm having trouble with. On page 13 it is stated that the model or design matrix
$X$ can always be decomposed into $$X = Q \left[ \begin{matrix}{R}\\{0}\end{matrix}\right] = Q_fR$$
where $R$ is a $p \times p$ upper triangular matrix and $Q$ is an $ n \times n $ orthogonal matrix, the first p columns of which form $Q_f$. Recall that orthogonal matrices rotate vectors, but do not change their length. Orthogonality also means that $QQ^t = Q^tQ=I_n$. Applying $Q^t$ to $y-X\beta$ implies that $$\left\|y-X\beta\|^2=\right\|Q^ty-Q^tX\beta\|^2=\left\|Q^ty-\left[\begin{matrix}R\\0\end{matrix}\right]\beta\right\|^2 $$
Writing $Q^ty=\left[\begin{matrix}f\\r\end{matrix}\right]$, where $f$ is vector of dimension $p$, and hence $r$ is a vector of dimension $n − p$, yields
$$ \|y-X\beta\|^2=\left\|\left[\begin{matrix}f\\r\end{matrix}\right] - \left[\begin{matrix}R\\0\end{matrix}\right]\beta \right\|^2=\|f-R\beta\|^2+\|r\|^2 $$
The length of $r$ does not depend on $\beta$ , while $\|f − R\beta\|^2$ can be reduced to zero by choosing $\beta$ so that $R$ equals $f$ . Hence $$ \hat\beta=R^{-1}f $$ is the least squares estimator of $\beta$. Notice that $\|r\|^2=\|y-X\hat\beta\|^2$, the residual sum of squares of the model fit.
I find this a bit confusing for a number of reasons. First, it seems to me that $Q$ will only be $n\times n$ if X is $n\times n$. However in general $Q$ will be $n\times p$ where the number of parameters $p$ is less than the number of subjects/cases $n$. It is also stated that the first $p$ columns of $Q$ make up $Q_f$. In order to multiply $Q$ and $R$, $Q$ must contain $p$ columns. Second, if we had a simple design matrix of dimension 2x4 (i.e. we have an intercept and slope paramater and 4 subjects) y would be 4x1 and $Q^t$ 2x4. $Q^ty$ will be 2x1. Thus it is not clear to me how $Q^ty =\left[\begin{matrix}f\\r\end{matrix}\right]$ where $f$ is vector of dimension $p$, and hence $r$ is a vector of dimension $n − p$. In this case $f = 2$ and $n-p=2$ which would give $Q^ty$ a dimension of 4x1.
I have checked the books errata and there are no mistakes listed for this section, so I must not be thinking about this quite right. If anyone could shed some light on what piece of this puzzle I'm missing, I'd greatly appreciate some enlightenment.