3

I am using a linear regression, yet the only output I need is the Sum of Squared Residuals (SSR), I don't care about the coefficients. (Context is a non-linear LS, which is linear given an extra parameter, so I am running a grid search of least-squares SSR)

Is there a fast algorithm to extract the SSR only, with the QR (or other) decomposition? In matrix form:

$$SSR = Y'PY = Y'(I - X(X'X)^{-1}X')Y$$ and with the QR decomposition, $X = QR$, one has $X(X'X)^{-1}X'=QQ'$, which suggests one does not need to do a matrix inversion at all, though on the other side, one needs to grow a $n \times n$ matrix... SO:

  1. Is there a fast algorithm to obtain the SSR only, or am I better sticking with usual function, that compute beta, and also the residuals

  2. Is there such an implementation in R, (or linpack/fortran, called by R?). I saw qr.resid(), but it seems to be slower than the lm.fit(), that returns everything? See benchmark below (changes when X is larger though).

Thanks!

data(EuStockMarkets)
X <- EuStockMarkets[1:100, 1:3]
Y <- EuStockMarkets[1:100, 4]


library(microbenchmark)  
microbenchmark(lm.fit = c(crossprod(lm.fit(X, Y)$residuals)),
               .lm.fit = c(crossprod(.lm.fit(X, Y)$residuals)),
               qr_res = c(crossprod(qr.resid(qr(X), Y))),
               times = 10)
#> Unit: microseconds
#>     expr    min     lq    mean  median     uq     max neval
#>   lm.fit 44.566 45.704 97.3347 47.0335 51.143 543.285    10
#>  .lm.fit 11.144 11.995 15.0783 13.1285 13.856  35.594    10
#>   qr_res 49.495 51.841 79.9043 53.4505 55.014 317.989    10

Created on 2018-10-20 by the reprex package (v0.2.1)

Matifou
  • 2,699
  • 14
  • 25
  • "(Context is a non-linear LS, which is linear given an extra parameter, so I am running a grid search of least-squares SSR)" That sounds like you should be using `nls` with `algorithm = "plinear"`. – Roland Oct 22 '18 at 06:22
  • oh thanks @Roland, I wasn't aware of this! It turn out that in my context, the non-linear parameter is discrete, so I suspect that the `nls()` function will not handle this. But I should look at whether the `nls()` or the paper cited (Golub and Pereyra (1973) ) did offer an efficient way to compute the SSR only!? – Matifou Oct 22 '18 at 15:54
  • Maybe you could provide a reproducible example with your actual model? Brute-force grid search is not efficient by definition. – Roland Oct 23 '18 at 06:01
  • It's called a changepoint/threshold model, where the parameter to optimize over is truly discrete, so brute force is an accepted method. Hence finding a fast way to obtain the SSR would be great! – Matifou Oct 30 '18 at 16:00

0 Answers0