5

I'm working on a forecast for the following data:

data <-
c(1932, 4807, 6907, 8650, 10259, 11374, 8809, 6745, 7429, 
8041, 9740, 10971, 11953, 9227, 7401, 8355, 9681, 10438, 
11092, 11543, 9181, 7428, 8358, 10049, 10938, 12280, 
13063, 10022, 8125, 8763, 9330, 9919, 11309, 12169, 11063, 
10112, 10621, 11506, 12425, 12929, 13025, 10938, 9437, 
9910, 11104, 11985, 13024, 13962, 11900, 9576, 9590, 
10740, 11689, 13084, 13829, 11975, 10224, 10493, 11899, 
12697, 13959, 14415, 11650, 9477, 11166, 12327, 13238, 
13801, 13493, 11118, 9073, 9954, 11077, 12509, 12985, 
13380, 11454, 9265, 10053, 11443, 12132, 13733, 13850, 
11560, 9401, 9921, 11401, 12622, 14224, 14289, 12097, 
9623, 10630, 11572, 12816, 14180, 14125, 11667, 9328, 
9936, 11159, 12536, 13953, 13840, 11430, 9313, 9926, 
11557, 12428, 13802, 13041, 9927, 7448, 9143, 10872, 
12331, 14370, 14496, 13237, 11176, 11936, 12661, 14442, 
15005, 15359, 12871, 10505, 11231, 12078, 13307, 14027, 
14368, 12057, 9965, 10121, 11414, 13375, 14525, 14686, 
12243, 9833, 10722, 11778, 13143, 14844, 14856, 12745, 
9134, 7856, 9429, 11539, 13241, 14324, 12102, 10136, 
11107, 12028, 13999, 15130, 15488, 13379, 11028, 11708, 
13280, 14665, 15362, 15600, 12950, 10716, 10988, 12350, 
14163, 15264, 15724, 13374, 11764, 12711, 13239, 14849, 
15455, 15914, 13541, 10570, 9376, 10132, 11725, 12328, 
13105, 11022, 9710, 10659, 12068, 12890, 14242, 14294, 
11847, 9776, 10681, 12413, 13571, 14344, 14500, 12234, 
9961, 10699, 11626, 13135, 14387, 15282, 13028, 11211, 
11992, 13524, 15131, 15741, 15357, 12489, 9985, 10786, 
11492, 13851, 14509, 14751, 12327, 10023, 11315, 12363, 
13487, 14944, 15006, 12290, 9867, 11540, 12179, 14094, 
14941, 15006, 13585, 10769, 11408, 12634, 14073, 15361, 
15236, 13151, 9580, 8934, 10128, 12475, 13890, 14740, 
12617, 10358, 11648, 12418, 14094, 15127, 15775, 13647, 
11281, 11773, 13407, 15441, 15601, 15951, 13865, 11447, 
12422, 13725, 15766, 16389, 16868, 15221, 12503, 12780, 
14525, 16479, 17032, 17403, 14553, 12484, 13204, 13792, 
14896, 15673, 16332, 14196, 11749, 12977, 13886, 14931, 
15955, 16037, 14082, 11271, 12512, 13942, 16362, 17456, 
17446, 15509, 13069, 13524, 14918, 16161, 17524, 18138, 
14604, 12993, 13763, 14945, 16686, 17717, 17947, 15744, 
13388, 13177, 14588, 16075, 16705, 17074, 14415, 12766, 
13372, 14033, 14300, 12508, 11502, 9391, 7689, 9613, 
12291, 14448, 15075, 15670, 13929, 10989, 11875, 13409, 
15203, 15654, 16150, 13387, 10931, 11492, 12479, 13674, 
14519, 14241, 11685, 9486, 9990, 11440, 12415, 13505, 
12103, 10311, 8267, 7510, 8595, 10620, 11664, 3182, 6241, 
9365, 10965, 12372, 9958, 8088, 9290, 10665, 12132, 12827, 
13040, 10692, 8882, 9538, 10027, 12086, 13276, 13107, 
10680, 9136, 10744, 11733, 13334, 14654, 14830, 12189, 
9613, 11399, 12837, 13661, 15007, 15579, 12268, 9703, 
10627, 12077, 13287, 14459, 14825, 11958, 10049, 11512, 
12770, 13869, 14873, 15233, 12056, 9654, 10386, 11465, 
13354, 14601, 15161, 12324, 9782, 10791, 12502, 14111, 
14914, 15250, 12366, 10333, 11638, 12449, 13518, 14637, 
14756, 12011, 9878, 10976, 12464, 13674, 14979, 15312, 
12106, 10127, 11666, 12843, 13910, 15024, 15333, 12308, 
9992, 11278, 13364, 14966, 15231, 15507, 13744, 11417, 
12232, 14414, 15245, 15988, 15168, 11905, 9165, 10536, 
12570, 14106, 15204, 15509, 12821, 10321, 11282, 13133, 
14174, 15099, 14750, 12817, 10384, 11368, 12994, 14591, 
16154, 15904, 12784, 10737, 11865, 13809, 14721, 15202, 
15322, 12722, 10741, 11991, 13546, 14716, 15817, 15879, 
12679, 10390, 11524, 13140, 14426, 15613, 16212, 13088, 
10720, 11730, 13776, 14477, 15758, 15922, 13119, 9220, 
8372, 10239, 12397, 14740, 15550, 13306, 10833, 11892, 
13630, 15186, 16154, 16678, 12898, 10485, 11313, 13705, 
15572, 16086, 16305, 14129, 11066, 12251, 13830, 15345, 
16550, 16518, 13700, 10890, 12301, 14163, 15890, 16985, 
17544, 15337, 12633, 13383, 12813, 12051, 13149, 13636, 
10914, 9617, 10619, 12224, 13954, 15325, 15473, 12418, 
9730, 11214, 12572, 14565, 15287, 15721, 12519, 10689, 
11662, 13139, 14902, 16374, 16392, 13895, 11777, 12948, 
14326, 15625, 16745, 16980, 13946, 11181, 12665, 13678, 
15269, 16279, 16634, 14399, 11142, 11900, 13800, 14783, 
16626, 16861, 13917, 11228, 12531, 14206, 15773, 16344, 
16930, 13945, 11110, 12427, 14085, 15627, 16854, 17106, 
14677, 10410, 8550, 10626, 13366, 15337, 16460, 13619, 
11630, 12582, 13926, 15297, 16715, 17036, 14063, 11368, 
12246, 14111, 15525, 16900, 17272, 14254, 11961, 13155, 
14579, 16260, 17187, 17919, 15493, 13162, 13771, 15231, 
15836, 16880, 16976, 14728, 12106, 13030, 13848, 15344, 
16475, 17122, 13601, 10921, 12043, 14114, 15846, 16190, 
17125, 13769, 10768, 12336, 13849, 16138, 17507, 18050, 
15492, 12905, 12847, 14181, 15967, 16704, 17762, 14882, 
12591, 13807, 14959, 16933, 17369, 17453, 14351, 11582, 
13102, 14328, 16185, 16321, 16843, 13773, 11053, 12199, 
14147, 14470, 12598, 11916, 9185, 7903, 9742, 12691, 
15153, 15945, 16254, 13630, 11437, 12235, 14040, 15161, 
15995, 16291, 12944, 10947, 12055, 13444, 14852, 16029, 
16361, 13658, 10885, 11604, 13030, 13959, 14291, 14786, 
12002, 9014, 7610, 7426, 9602, 11077, 12544, 11334, 5710, 
9874, 11949, 10321, 8945, 10152, 11821, 13434, 15187, 
15269, 12661, 10699, 12040, 13154, 14149, 15472, 16569, 
13008, 10521, 11674, 13272, 14025, 15803, 16791, 13615, 
11043, 12448, 13929, 15158, 16610, 17520, 13900, 11095, 
11735, 13652, 14939, 16001, 16265, 13371, 11198, 11583, 
13377, 15361, 16420, 16765, 13800, 10866, 12026, 13908, 
14902, 16044, 16807, 13694, 11475, 13009, 14453, 16231, 
17093, 17411, 14433, 12242, 13035, 14304, 16309, 17026, 
16811, 13986, 11812, 13216, 14397, 16026, 17780, 17463, 
14717, 12029, 13046, 14820, 16626, 17564, 17802, 14134, 
13158, 15356, 16573, 16887, 17494, 17326, 13525, 11517, 
12410, 13817, 14933, 16399, 17019, 14008, 11808, 12599, 
14639, 16339, 17521, 17820, 14444, 11530, 13352, 14997, 
16038, 17631, 17614, 15601, 15176, 16930, 17979, 18772, 
19728, 19452, 16272, 14006, 15510, 17299, 17774, 18345, 
19080, 16486, 14242, 15465, 16973, 17971, 19068, 19075, 
15606, 13315, 14784, 16505, 17910, 18586, 18315, 15659, 
13621, 14673, 16037, 17467, 17972, 17676, 15452, 11850, 
10959, 13641, 15217, 16813, 17641, 15404, 13102, 14391, 
15764, 17326, 17715, 17947, 15272, 13078, 13962, 15372, 
18292, 18569, 16427, 13374, 14725, 15957, 17425, 18530, 
19251, 17094, 13711, 15275, 16663, 18254, 19023, 19787, 
16636, 14398, 15392, 16302, 15844, 14301, 14559, 11739, 
10080, 11690, 14352, 16702, 17810, 17898, 15159, 12527, 
14250, 15788, 17012, 18219, 17743, 15183, 12633, 14033, 
15528, 16984, 18041, 18388, 15248, 12831, 14289, 16143, 
17340, 18863, 18597, 15984, 13697, 14653, 16143, 17262, 
17805, 18565, 16147, 14734, 16548, 17410, 18044, 18705, 
18462, 15706, 13242, 14977, 16168, 17683, 18224, 18454, 
15784, 14003, 16605, 18013, 19361, 19204, 18970, 16655, 
12928, 11502, 13233, 15211, 16883, 17454, 15043, 12953, 
14515, 15846, 17501, 18922, 18903, 16175, 13492, 14150, 
15710, 18297, 18872, 19490, 15921, 13935, 14943, 16457, 
18425, 19975, 20440, 17716, 15059, 16086, 17290, 18477, 
19896, 20115, 17580, 15001, 15640, 17915, 18951, 20029, 
20221, 16653, 15063, 15726, 16849, 18121, 18843, 19112, 
16516, 13960, 15255, 16910, 18895, 20091, 20663, 17698, 
15441, 16775, 18158, 19897, 20424, 20111, 17784, 15044, 
16869, 17773, 19783, 21255, 20632, 18081, 15891, 17180, 
18143, 20197, 20926, 20639, 18407, 16313, 16998, 17860, 
19177, 19618, 19919, 17662, 16033, 17439, 18741, 18108, 
16641, 16319, 13221, 11160, 12783, 14876, 16831, 18379, 
18858, 16191, 14632, 16089, 16828, 18169, 19512, 18828, 
17364, 15516, 17065, 18245, 18684, 19472, 19235, 16885, 
14854, 14526, 12921, 12675, 14884, 15284, 13492, 11457, 
5938, 9694, 9429, 9142, 10648, 13235, 15610, 16868, 17364, 
16043, 14497, 15329, 16839, 17548, 18818, 19320, 15884, 
13834, 14748, 15784, 16729, 18274, 19138, 17413, 15394, 
16596, 17853, 18934, 20310, 20165, 18870, 16562, 16823, 
18051, 18816, 20410, 21211, 18551, 16274, 17289, 18317, 
20259, 19993, 19831, 18166, 16517, 17114, 17763, 19011, 
20541, 19974, 18105, 16130, 17422, 18472, 20213, 20721, 
20803, 19250, 16246, 16582, 18410, 19559, 20821, 20412, 
18576, 16272, 16917, 19027, 19917, 20418, 21188, 18382, 
16842, 17911, 19126, 20471, 21120, 20756, 18190, 15873, 
16924, 18468, 19579, 20877, 20726, 18525, 16110, 17480, 
19313, 20323, 20661, 20541, 18284, 16124, 17312, 18361, 
19170, 19945, 20548, 17605, 15973, 17488, 17444, 19086, 
19775, 19827, 17269, 14616, 15690, 16469, 18626, 19288, 
20111, 17769, 15738, 17060, 18885, 20010, 21371, 21541, 
18682, 15971, 16714, 18659, 19934, 21499, 22118, 18952, 
16025, 18120, 18897, 20630, 20286, 21077, 17710, 14857, 
16050, 17877, 19928, 21299, 21202, 18858, 14339, 13172, 
15521, 17434, 19823, 20679, 18288, 16798, 18673, 20628, 
21462, 22720, 22241, 20064, 17327, 18720, 19896, 19710, 
21185, 21916, 19661, 17134, 18027, 19449, 20912, 21234, 
21950, 19495, 17023, 18473, 19080, 20875, 21031, 21492, 
20091, 17511, 18834, 19126, 19922, 21215, 19017, 15506, 
12854, 14605, 16279, 18129, 20043, 21248, 18518, 15467, 
16586, 18277, 18915, 20597, 21244, 19024, 16294, 17234, 
18786, 20960, 21345, 22068, 19774, 17491, 18279, 19809, 
20757, 21618, 22131, 20214, 17581, 18321, 19590, 21486, 
22492, 23194, 20020, 16819, 17892, 18948, 20921, 21696, 
22549, 19559, 16404, 17301, 18659, 20430, 22300, 22569, 
19630, 16800, 17898, 19584, 21190, 21926, 22359, 20157, 
15823, 14136, 15930, 18341, 21044, 21204, 18994, 16973, 
18171, 19378, 20794, 22442, 22144, 19874, 17859, 18703, 
19082, 20781, 21860, 21536, 20172, 18429, 19221, 19824, 
21326, 22504, 23381, 21733, 19231, 20312, 21994, 22609, 
23317, 23074, 22005, 19209, 20734, 22513, 23017, 23698, 
24385, 22512, 19471, 20061, 21235, 22351, 22532, 22869, 
20409, 17908, 18722, 19894, 20960, 21999, 22125, 20797, 
19091, 19910, 20463, 22106, 22737, 22827, 21695, 19498, 
20180, 21204, 22272, 22803, 22808, 20979, 18952, 20365, 
20875, 22944, 23022, 22786, 21284, 19302, 20394, 21144, 
22633, 23511, 23355, 21979, 19988, 20143, 21966, 22574, 
19974, 19410, 15641, 13265, 14880, 16838, 19262, 19941, 
20479, 18929, 17760, 18078, 19055, 20553, 21732, 21671, 
19218, 18485, 18864, 20278, 21120, 21747, 21087, 17982, 
15115, 16518, 16282, 15032, 15658, 14966, 12172, 10336, 
12669, 14238, 14031, 12441, 13313, 11047, 10158, 12438, 
14255, 16434, 17873, 18481, 16360, 14479, 15595, 17392, 
18878, 19999, 19958, 16748, 13852, 14931, 16410, 18097, 
19654, 19480, 16387, 14515, 15205, 16854, 18544, 19510, 
20382, 17838, 14878, 15041, 16661, 19008, 20265, 20947, 
18048, 16472, 16434, 18250, 19571, 21148, 20117, 17788, 
14321, 14996, 15779, 17789, 18804, 18934, 17488, 15095, 
15859, 16691, 18369, 20012, 21073, 18029, 15582, 17247, 
18608, 19783, 20322, 20908, 18221, 15919, 17107, 18404, 
19262, 21741, 21514, 19798, 17410, 17973, 18469, 17910, 
14901)

The ts.plot(data) gives:enter image description here

With this data, I'm looking to forecast the values for the next year. This data is victim to both weekly and yearly seasonality. Due to this, I first attempted to use tbats from the forecast package but received an improper forecast that mirrors that found at http://www.github.com/robjhyndman/forecast/issues/87

Instead, I used the following code:

n<-length(data)
bestfit <- list(aicc=Inf)
bestk <- 0
for(i in 1:20)
{
fit <- auto.arima(data, xreg = fourier(1:n,i,m1) + fourier(1:n,i,m2), max.p=10, max.q=10, max.d=2, stepwise=FALSE, ic="aicc", allowdrift=TRUE)
if(fit$aicc < bestfit$aicc)
{
    bestfit <- fit
    bestk <- i
}
}

k <- bestk

bestfit <- auto.arima(data, xreg = fourier(1:n,k,m1) + fourier(1:n,k,m2), max.p=10, max.q=10, max.d=2, stepwise=FALSE, ic="aicc", allowdrift=TRUE)
accuracy(bestfit)
fc <- forecast(bestfit, xreg = fourier((n+1):(n+365),k,m1) + fourier((n+1):(n+365),k,m2), level = c(50,80,90), bootstrap = TRUE)
plot(fc)

This code is searching for the best ARIMA model through the use of Fourier terms in xreg to capture both seasonality components. This Fourier function is defined (per http://robjhyndman.com/hyndsight/longseasonality/) as:

fourier <- function(t,terms,period)
{
  n <- length(t)
  X <- matrix(,nrow=n,ncol=2*terms)
  for(i in 1:terms)
  {
    X[,2*i-1] <- sin(2*pi*i*t/period)
    X[,2*i] <- cos(2*pi*i*t/period)
  }
  colnames(X) <- paste(c("S","C"),rep(1:terms,rep(2,terms)),sep="")
  return(X)
}

This forecasting gives me the following plot:enter image description here

In looking at this forecast, it seems by my naked eye to be off. Just by observation it appears that my forecast is not properly catching the small, but visible, increasing trend. Instead of being "centered" around the extended trendline, it appears that the forecast is "centered" around the mean of the entire dataset.

First off, am I doing something that is just blatantly wrong? (my mind is a little fuzzy this morning)

If my forecast is correct, how is it that it falls so much below the extended trendline?

Lastly, are there any other suggestions which might be beneficial to my forecasting?

JRW
  • 231
  • 2
  • 4
  • 10
  • It would be useful if you could clean up your code (e.g., there is an empty `else` branch) and explain *what* your code does. In addition, it seems like there may be drops in your time series around the end of each year (Christmas?). If you include dummies for these drops in the `xreg` parameter, `auto.arima` may have an easier time. – Stephan Kolassa Mar 25 '15 at 16:32
  • @StephanKolassa: Just edited my post. You are correct in that the drops in my time series are around Christmas (other smaller drops occur around other holidays). In my limited theoretical knowledge of time series forecasting, I would think that the model itself (with the typical 365.25 day seasonality defined) could accurately account for those holidays that occur on the same date each year. However I very well could be wrong. – JRW Mar 25 '15 at 17:00
  • In theory, yes, Fourier dummies should be able to capture the Christmas and other effects. In practice, this may be similar to giving 10,000 typewriters to 10,000 monkeys and waiting for them to produce *Hamlet* - they will eventually get it done, but they will produce a lot of garbage. Easier to just feed stuff you *know* to the model. I suggest looking at some [seasonal plots](https://www.otexts.org/fpp/2/1), creating relevant dummies by hand and feeding these into the `xreg` parameter. – Stephan Kolassa Mar 25 '15 at 17:12
  • what is the start date, is it daily or weekly data ? – forecaster Mar 25 '15 at 18:47
  • It is daily data that begins on 1/4/2011 and ends on 3/21/2015 – JRW Mar 25 '15 at 19:53
  • OK, Please try a simple multiple linear regression with day of the week+day of the month+month of the year+holidays+pre and post dummies for holidays+linear and/or quadratic trend. Consider this as a naive model. See if any "complex" models improves out of sample accuracy from the naive model. Personally I would lean towards the model that @irishstat proposed. – forecaster Mar 26 '15 at 18:48
  • @StephanKolassa: Just for clarity, are you suggesting that I keep feeding my Fourier terms into xreg but also feed in the holiday dummies? Or are you suggesting I only feed in the holiday dummies? I'm unsure of how I can feed in both. – JRW Mar 27 '15 at 12:33
  • @forecaster: I will try out the regression you suggest. With only elementary time series knowledge I have a feeling that my toolbox is about to expand! – JRW Mar 27 '15 at 12:42
  • I would propose feeding both "hand-coded" dummies *and* Fourier terms into `xreg`. You can do this by simply `cbind()`ing them together into one big matrix. – Stephan Kolassa Mar 27 '15 at 13:30
  • @forecaster;The model you suggest requires pre-specification of 1)the # of unique periods before &after each holiday that are important;2)the # of unique trends &when the trends changed (not handled by powers of time);3)that there are no level shifts(this is not the same as trends) ; 4) there are no pulses/one time anomalies;5)that the day-of-the-week parameters are invariant;6) there are no specific days in the month that have assignable cause;7)the error variance is homogeneous over time;8)there no long weekend effects around holidays ETC Each series needs analysis to find the best combo. – IrishStat Mar 29 '15 at 21:26
  • @forecaster ... continuing ... But I guess that's why you stated that my modelling/approach was preferred. – IrishStat Mar 29 '15 at 21:42

1 Answers1

4

As described elsewhere Time Series Forecasting with Daily Data: ARIMA with regressor daily data presents complications to some automated procedures and opportunities to others. I used AUTOBOX http://www.autobox.com/cms/ ( a commercial piece of software expressly designed and funded by major US beverages/retailers ) to extract information from the data that I have helped develop,. Apparently our monkeys/heuristics are up to this task as it was totally autonatic. A picture enter image description here is worth a thousand words. Here is a graph of the actual/fit and forecast which I believe suggests a reasonable model/approach. The final model included ( no surprise here) two different trends, daily effects, monthly effects , particular days of the month effects, long weekend effects , a few unusual values , very significant pre and post holiday effects along with major end-of-the-year effects . Specifically the forecasts for the next 365 days is interesting enter image description here The residuals from the final model exhibited reasonable stability enter image description here All models are wrong and some are even wronger (sic) .

EDIT AFTER @forecaster comments...

This is an excerpt from the equation .enter image description here .The size of the coefficients for the daily effects are small compared to the holiday effects which is why you are not seeing a strong day-of-the-week profile in the forecasts.

EDIT: DELIVERING FORECASTING ACCURACIES FOR MY NINJA BADGE

I developed models/forecasts from 6 origins for a 30 day out forecast and reported the MAPE. There were 1338 original observations thus the forecast origins were 1308,1278,1248,1218,1188 and 1158 respectively. enter image description here

IrishStat
  • 27,906
  • 5
  • 29
  • 55
  • If you wanted to be a ninja, you might not take the full data, and you might use the first 2/3 to predict the last 1/3 so you could evaluate forecasting errors vs. actual. I know it wasn't what the requestor asked, but it would be a good thing to know for bounding risk. – EngrStudent Mar 25 '15 at 21:54
  • 1
    The problem with your suggestion is that accuracy is then based upon a sample of 1. A more thorough approach is to generate models and forecasts from many origins for a user-specified horizon length . But this may be a task for tomorrow ... I will consider doing this f or a few origins say for a lengtht of 30 periods out, – IrishStat Mar 25 '15 at 22:04
  • +1, Exceptional results. I don't know why your response hasn't received much attention as this is a tough problem to solve. Can you please comment on why the model doesn't pick up the daily fluctuations ? May be due to scale??. – forecaster Mar 26 '15 at 18:44
  • I too often wonder why there is a loud non-response to what is presented. I put it out to invite other professionals like yourself to deliver/motivate viable alternatives and/or improvements in strategy. Big players like IBM/SAP/SAS seem to just wave their hands and walk away. This is not just a tough problem this is an incredibly tough problem and reflects the sum total of having solved problems like this for nearly 20 years ... all driven by Anheuser-Busch's desire to predict daily sales for 600,000 stores taking into account price, weather & promotions for some 100 different products. – IrishStat Mar 26 '15 at 20:47
  • +1 for your effort and results. Autobox looks like an awesome software (If only my bosses would let me get it...). Now it appears that I need to learn how to work all of this magic in R. Are there any resources that you guys think might be useful for doing this in R? – JRW Mar 27 '15 at 12:27
  • Thanks for the kudos! The concept of opportunity cost comes to mind. I suggest that if you don't get any ideas on "free magic in R" that you contact the friendly folks at AFS and work out an agreement as every purchase has it's price. – IrishStat Mar 27 '15 at 12:45
  • Without prejudice, the silence of the non-respondents speaks for itself. This problem is a potential dissertation topic and may be too hot/difficult to handle for many typical respondents. – IrishStat Mar 27 '15 at 19:45