10

As a side hobby, I have been exploring forecasting time series (in particular, using R).

For my data, I have the number of visits per day, for every day going back almost 4 years. In this data there are some distinct patterns:

  1. Monday-Fri has a lot of visits (highest on Mon/Tue), but drastically less on Sat-Sun.
  2. Certain times of the year drop (i.e. many less visits around U.S. Holidays, summers show less growth)
  3. Significant growth year-to-year

It would be nice to be able to forecast an upcoming year with this data, and also use it to have seasonally adjusted month-to-month growth. The main thing that throws me off with a monthly view is:

  • Certain months will have more Mon/Tue than other months (and that isn't consistent over years either). Therefore a month that happens to more weekdays needs to be adjusted accordingly.

Exploring weeks also seems difficult since the week numbering systems change from 52-53 depending on the year, and it seems ts doesn't handle that.

I'm pondering taking an average for the weekdays of the month, but the resulting unit is a bit strange (Growth in Avg. Weekday Visits) and that would be dropping data which is valid.

I feel this sort of data would be common in time series, (say for example electricity usage in office building might be something like this), anyone have any advice on how to model it, in particular, in R?

The data I am working with is pretty straight forward, it starts like:

            [,1]
2008-10-05 17607
2008-10-06 36368
2008-10-07 40250
2008-10-08 39631
2008-10-09 40870
2008-10-10 35706
2008-10-11 18245
2008-10-12 23528
2008-10-13 48077
2008-10-14 48500
2008-10-15 49017
2008-10-16 50733
2008-10-17 46909
2008-10-18 22467

and continues like this up to the present, with an overall trend of growth, some dips around US holiday weeks, and growth generally slowing during the summer.

Kyle Brandt
  • 737
  • 1
  • 6
  • 17
  • Another interesting aspect to the data is that there are sudden events that interrupt the overall trend of growth by a period of ~ couple months. Right now though, where I am at the stage of trying to properly set seasonality, I am ignoring that aspect. – Kyle Brandt Feb 21 '12 at 17:38
  • Also, correct me if I am not using "seasonality" correctly. I am currently thinking of it a patter within the time unit I say. So "Weekly Seasonality" to me means "A pattern that repeats every week". – Kyle Brandt Feb 21 '12 at 17:40
  • See answers to http://stats.stackexchange.com/questions/14742/auto-arima-with-daily-data-how-to-capture-seasonality-periodicity. Might be a starting point. – Peter Ellis Feb 21 '12 at 20:15
  • Maybe at the heart of this is the combination of week + year? It seems `ts` (and even `msts`) doesn't fit a sampling period of a week with a "natural" period of a year (Nor do calendars really I guess). Or, I just don't understand how to make that work... – Kyle Brandt Feb 21 '12 at 20:45
  • @IrishStat: Not sure if I can share the data, put a small chunk of it, with a description of what the rest would be doing. It is going to be very similar to the following, just perhaps more accurate. https://www.quantcast.com/p-c1rF4kxgLUzNc . I'm more interested in learning how to work with these sort time series then I am specific results around this particular example. – Kyle Brandt Feb 21 '12 at 21:04
  • I gave up on weekly data because of the 52/53-week issue (and others related to software not handling weeks properly), and went to monthly data. This decreased the noise and also mate it more intuitive (quick, tell me all about April, then tell me all about week 17 of the year). You still have to compensate for features that change for a month across years (floating holidays, number of weekend days, etc), but it seems simpler than trying to make tools work with weeks. – Wayne Feb 21 '12 at 21:12
  • @Wayne: I think this seems to be a limitation of what is currently available in R after doing a little more research. It seems the forecast package depends on ts, which is frequency based, which is ignorant to a real calendar. To take days of the week into account AND yearly seasonality, I would probably need forecasting based on something like `xts` – Kyle Brandt Feb 22 '12 at 16:55
  • @Kyle: It's a hard problem, and I'm not sure what a good solution would look like for weekly time series. Weeks change more radically than months from year-to-year, with weekend/weekday, holiday, floating holiday, special events (sales, promotions, etc), etc. – Wayne Feb 22 '12 at 17:44

1 Answers1

5

I model thus kind of data all the time. You need to incorporate

  • day-of-the-week
  • holiday effects ( lead , contemporaneous and lag effects )
  • special days-of-the-month
  • perhaps Friday before a holiday or a Monday after a holiday
  • weekly effects
  • monthly effects
  • ARIMA structure to render the errors white noise;
  • et.al. .

The statistical approach is called Transfer Function Modelling with Intervention DEtection. If you want to share your data either privately via dave@autobox.com or preferably via SE , I would be more than glad to actually show you the specifics of a final model and further your ability to do it yourself or at least to help you and others to understand what needs to be done and what can be done. In either case you come out smarter without spending any treasure be it coin or time.You might read some of my other responses to time series questions to learn more.

Ferdi
  • 4,882
  • 7
  • 42
  • 62
IrishStat
  • 27,906
  • 5
  • 29
  • 55