3

I have the following R code:

library("forecast")
library("rbenchmark")

v <- c(95.3, 96.8, 97.2, 97.9, 98.2, 98.5, 99.3, 99.9, 102.7, 104.2, 107.2, 109.3, 109.8, 110.5, 111.2, 111.6, 112.6, 113.5, 114.3, 115, 117.4, 119, 119.3, 120.1, 120.7, 121.1, 121.9, 122.2, 122.9, 124.6, 129.1, 128.8, 129.3, 130.3, 131.1, 131.8, 132.9, 133.5, 134.5, 135.6, 139.4, 138.9, 140.9, 141.8, 142.8, 142.4, 142.8, 143.9, 144.8, 145.5, 149.6, 151, 152.1, 152.7, 154.5, 153.9, 153.8, 153.8, 154.5, 155.5, 158.9, 159, 158.7, 159.7, 159.7, 159.7, 160.1, 159.9, 161.3, 161.9, 164.4, 164.4, 164.7, 165.1, 165.2, 164.9, 166.9, 167.8, 169.4, 170.1, 171.6, 172.9, 173.7, 175.2, 175.8, 176.3, 177.1, 177.5, 178.8, 180.2, 183, 184, 184.7, 186.5, 187.3, 187.9, 187.9, 188.7, 190.2, 191.8, 199, 199.9, 205.4, 205.2, 206.4, 206.2, 208.2, 209.6, 212, 213.4, 218.9, 225, 225.8, 227.1, 227.3, 227, 227.1, 226.7, 229.2, 230.1, 230.2, 230.3, 231.3, 231.9, 232, 231.5, 231.2, 231.3, 234.6, 235.1, 241, 241.6, 242.7, 243.7, 243.1, 242.3, 241.9, 242.3, 244.5, 245.2, 245.1, 245.9, 246.8, 247.8, 248.3, 248.4, 248.4, 248.5, 250.7, 251, 251.3, 252.3, 253.3, 255, 255.3, 255.1, 254.8, 254.5, 256.2, 256.9, 255.6, 255.8, 257, 257.6, 257.3, 256.3, 255.7, 254.5, 256, 255.9, 254.6, 254.2, 255.2, 257, 257, 257.4, 257.3, 257.4, 259.8, 259.6, 256.9, 256.6, 257, 257.7, 258.1, 257.6, 257, 255.7, 256.8, 257.3, 256.2, 256.3, 257.3, 257.9, 258.3, 258.7, 257.6, 257.6, 259.4, 259.7, 257.5, 258.7, 259.9, 260, 261.3, 261.2, 260, 260.2, 262, 262.6, 261.7, 262.6, 264.6, 266.9, 268.7, 268.3, 266.9, 267.6, 269.9, 269.1, 268.8, 269.4, 271.8, 272.9, 273.6, 273.2, 272.3, 272.4, 274.5, 275.4, 276, 278.4, 279.8, 278.8, 278.5, 277.7, 276.8, 276.7, 278.7, 278.9, 278, 277.3, 279.4, 279.4, 280.1, 278.9, 278.5, 278.2, 280.2, 281, 277.9, 279.2, 279.8, 280.2, 280.3, 280.4, 279.4, 279.9, 281.9, 282.4)
t <- ts(v, frequency = 12)
o <- function(a){ 
  accuracy(croston(t,h=12,alpha=a))["MAE"] 
}
startTime <- proc.time()
optimize(o, 0.1, lower=0.1, upper=0.9, tol=0.1)
proc.time() - startTime

This yields an acceptable result however the optimization process is taking around 35 seconds. I really need to get it down to something more like 3 or 4 seconds. I tried playing around with the tolerance, but even with tol=0.5 the optimization still takes 15 seconds (and no longer yields very useful results).

I imagine the optimize function is using something like a Nelder-Mead algorithm. Is there maybe a better/faster optimization algorithm that could be used specifically to optimize the alpha parameter of the croston function?

Stephan Kolassa
  • 95,027
  • 13
  • 197
  • 357
James Crosswell
  • 213
  • 1
  • 5

2 Answers2

2

Using

o <-function(a){ accuracy(croston(t,h=12,alpha=a))[3] }

if you check the execution times of the following constraint optimization algorithms you'll find that nlminb is the fastest for your problem, L-BFGS-B being a close second. I would recommend using one of the two. I tried also some quadratic approximation procedures (bobyqa) but it was not fast enough for this. Given your problem it seems that a trust-region optimizer is best (nlminb). I don't know if you are able of getting analytical derivatives out. In such case I would assume that the quasi-Newton method (L-BFGS-B) would be even faster because now it needs to compute derivatives numerically.

Run1 <- system.time( 
     Res1 <- optimize(o,0.1, lower=0.1, upper=0.9, tol=10e-4) ) 
Run2 <- system.time( 
     Res2 <- nlminb(0.1,o,control =list(abs.tol = 10e-4),lower=0.1, upper=0.9))
Run3 <- system.time(
     Res3 <- optim(0.1, o,control = list(abstol= 10e-4), lower=0.1, upper=0.9, method="Brent" ))
Run4 <- system.time( 
     Res4 <- optim(0.1, o,control = list(pgtol = 10e-4), lower=0.1, upper=0.9, method="L-BFGS-B" ))

Also a word of advice: I see you give a hard limit for the execution time of your optimization routine("I really need to get it down to something more like 3 or 4 seconds"). Is that realistic? I mean on my old computer (Intel T7400 @ 2.16GHz) a single function evaluation usually takes 5 seconds. Undoubtedly you will need some number of function evaluations $n \geq$ 2 so don't give yourself task that just can't be done.

(EDIT: I just tried the same checks on Intel i5-2500 CPU @ 3.30GHz; it was twice as fast, so on average 2.5 seconds. Nevertheless I hope you still see that this a 3-4 seconds target is almost prohibited by the computational complexity of your original function.)

In general I would suggest you take a look at the CRAN Task View on Optimization to have a look at available packages. There is definitely a chance that some not so standard optimization routine is better for your task. Always though try to be realistic, unless you have an extremely strong CPU you won't be seeing any 3-4 seconds solutions any time soon. Reformulating your original function so it computes faster is probably your best bet for a faster optimization procedure because as your problem is just a 1-D and your function appears pretty smooth anyway, no sophisticated algorithm will give you a huge edge over what you already have now.

usεr11852
  • 33,608
  • 2
  • 75
  • 117
  • Thanks, nlminb seems to run the quickest against that particular set of data... although optimize performed better against some other data I tried. As you say, 3-4 seconds may be unrealistic. I'd love to know how Hyndman optimizes the parameters for his ETS framework so quickly in the `forecast` method of the forecast package... which seemingly tries as many as 30 different exponential smoothing algorithms and optimizes 3 separate parameters against many of these in less time than I can optimize a simple croston's algorithm (also his code)! In any case, thanks for the replies :) – James Crosswell Jul 23 '13 at 20:46
  • Glad I could help. You don't have to wonder; you can always look at the source code of `forecast`. – usεr11852 Jul 23 '13 at 21:42
  • He seems to use [a C++ implementation of Nelder-Mead](https://github.com/robjhyndman/forecast/blob/master/src/etsTargetFunctionWrapper.cpp) rather than R. Actually the biggest impact on performance optimization for this algorithm is the number of data points in the history (training set). With 36 historical data points (3 x 12 months), I can get a nlminb optimization in under 0.5s. 60 points can be optimized in around 1s. I think 5 years worth of monthly historical data should be sufficient for what I'm doing. I guess I'll have to consider weekly forecasts impractical... – James Crosswell Jul 24 '13 at 08:19
1

Note that the function optimize uses "a combination of golden section search and successive parabolic interpolation" (from the help).

The function optim reduced the execution by a factor of 7-8 on my side, which is nearly what you're looking for. You might also try the different optimization methods that the function has to suit your needs. Try:

startTime <- proc.time()
optim(0.5, o, lower=0.1, upper=0.9)
proc.time() - startTime
QuantIbex
  • 3,880
  • 1
  • 24
  • 42
  • Thanks for the reply. Using `optim(0.5, o, lower=0.1, upper=0.9)` gives me an error on my side: Error in optim(0.5, o, lower = 0.1, upper = 0.9) : L-BFGS-B needs finite values of 'fn' In addition: Warning message: In optim(0.5, o, lower = 0.1, upper = 0.9) : bounds can only be used with method L-BFGS-B (or Brent) – James Crosswell Jul 23 '13 at 09:30
  • 1
    Apologies, I focused on the execution time and didn't pay attention to the warnings. Is it possible that the problem comes from the `o` function? When I run it with a equal to either 0.2, 0.4, 0.6, and 0.8 the function returns NA. Shouldn't the argument `t` used in call of `croston` be defined in the function `o` or passed as an argument? – QuantIbex Jul 23 '13 at 12:16
  • I second that worry about the definition of the `o` function; for starters just changing it to : `accuracy(croston(t,h=12,alpha=a))[3]`, will get rid if most `NA`'s. – usεr11852 Jul 23 '13 at 12:38
  • So it's a bit difficult to paste extensive blocks of code into comments on CrossValidated but I tried putting t into the actual `o(a)` function and it still doesn't work: Error in optim(0.1, o) : function cannot be evaluated at initial parameters – James Crosswell Jul 23 '13 at 13:23
  • ahem... finally understood/found the problem with the reference to `accuracy(...)['MAE']`. Quite right, this should be `accuracy(...)[3]` instead. – James Crosswell Jul 23 '13 at 20:34
  • I think `accuracy(...)[, 'MAE']` also works – QuantIbex Jul 23 '13 at 20:36
  • @QuantIbex: You are correct, of course it will work. I just suggested `[3]` to keep it as straightforward as possible in terms of computations. We are working with a matrix after all. – usεr11852 Jul 23 '13 at 21:50