7

This may seem a trivial question but I haven't found a satisfactory answer anywhere. I need to compute standard errors in a OLS regression $y = X\beta + u$ in R from scratch. How can I do this without inverting $X'X$ matrix? (The reason is that speed is important.) Suppose that I already have factorized the $X'X$ matric by Cholesky and already have $\hat \beta$ and $\hat \sigma$. Thanks!

whuber
  • 281,159
  • 54
  • 637
  • 1,101
newbie3
  • 71
  • 2
  • 1
    R is calculating standard error without inverting the matrix. R is doing QR decomposition. See this [link][1] . See page no 2 and 3 [1]: http://cran.r-project.org/doc/contrib/Leisch-CreatingPackages.pdf – vinux Jan 03 '12 at 14:21
  • 2
    Yes, please don't use Cholesky for OLS, especially if the data matrix is of any appreciable size. The accumulation of numerical roundoff error is often practically significant. – cardinal Jan 03 '12 at 15:10
  • 1
    You might be able to capitalize on special circumstances to speed up the calculation. See, for instance, http://stats.stackexchange.com/q/6920 – whuber Jan 03 '12 at 15:29
  • Thanks for the help. My X matrix is small, so Cholesky may be OK. The example given in the pdf is very nice, but I fail to understand what exactly chol2inv is doing with the R part of QR decomposition. Any hints on that? :) – newbie3 Jan 03 '12 at 15:35
  • 8
    If $X = Q R$, then $X^T X = R^T Q^T Q R = R^T R$. Recall that the covariance matrix of linear regression is $\sigma^2 (X^T X)^{-1}$. But $R$ is an upper triangular matrix, so $(R^T R)^{-1}$ can be found by one backward and one forward substitution. – cardinal Jan 03 '12 at 15:45
  • @cardinal, thanks, although I'm slow, that I figured out myself. :) How do I perform the backforward/forward substitution myself though, w/o a loop? Can I use solve() for this? – newbie3 Jan 03 '12 at 15:49
  • 3
    Only slightly tongue-in-cheek: By using `chol2inv` or `qr.solve`. – cardinal Jan 03 '12 at 15:52

1 Answers1

1

I had the same problem as I wanted to use the most efficient solvers available in my econometrics package. I developed an algorithm to solve for both the $\beta$ and the inverse of the linear predictor (normal matrix) for linear least squares (it also applies to WLS, Ridge regression, etc.)

So here is the pseudo-code, Given $X$ and $y$,

1) use an efficient solver to estimate $\beta$.

2) Store $X^{\top}X + \Gamma \Gamma^{\top}$

3) Create a new $Ax=b$ problem with:

  • x = vector of values of values that fully characterize $\left(X^{\top}X + \Gamma \Gamma^{\top}\right)^{-1}$. It is symmetric so there are $\left(\frac{k (k + 1)}{2}\right)$

  • b = vec($\beta$, vec($I_{k}$))

  • A = Linear predictor that establishes the following restrictions:

    • $A X^{\top} y = \beta$

    • $\left(X^{\top}X + \Gamma \Gamma^{\top}\right)^{-1}\left(X^{\top}X + \Gamma \Gamma^{\top}\right) = I$

Lastly, solve the new $Ax=b$ for $x$ using an efficient solver and recreate $A$ with it.

This way, you don't have to calculate the Cholesky decomposition or anything. A good solver could be lsmr which can give more precision and a more efficient implementation.