2

I've been struggling for some time trying to figure out the most appropriate way to analyse some data. My task is to (hopefully) explain what may be driving the flow of visitors/tourists to two historic sites, and I've got monthly totals for them from the past 5-6 years (also split by visitor countries of origin + ticket types, but that might just be complicating things...).

I've also gathered a bunch of extra monthly data (weather, economic indicators etc), in the attempt to explain variations in visitors over time. You can see here for a subset of the data (the real data contains many more rows, and several more predictors than just Temperature and CCI). There is also clear seasonality in the visitor data (spikes over summer), as well as a general increasing trend in visitor numbers.

If I am understanding correctly, we're already deviating from 'typical' time series territory, because I'm not really trying to predict the data from itself, but rather, from external sources. Hence, even if I've been coming across things like 'differencing' or 'de-trending', I am not sure if these make sense here...? It's rather the trend itself that I am trying to account for.

My attempts to deal with this so far have involved creating nlme::lme() or mgcv::gamm() models which can specify various random effects & autoregressive structures fitted to residuals - the latter being my attempt to take into account the fact that this is a time series where monthly measurements will be related to one another in some way. But (at least the way I've been specifying models), the seasonality in the data is not being handled well by lme(), and with gamm() I also have some doubts I am specifying the models correctly, as GAMs are quite new to me (actually, time series is new to me, generally...which makes this all the more challenging).

The overarching issue here is that I am not sure which approach is the most defensible with this data, and for this problem (how to explain what drives visitor numbers over time to these two sites, based on the predictors I have?)

Help appreciated - hopefully I'm not completely wrong in my thinking about all this!

PS. Something else that occurred to me was to extract just the trend or seasonally-adjust the data to simplify things (with stl() or forecast::seasadj()) and try to predict just that, but again, not sure whether that is justifiable and/or customary.

  • "typical' time series territory" in your lexicon is called SARIMA but time series analysis is way more than that .... see my response to https://stats.stackexchange.com/questions/380599/is-it-possible-to-automate-time-series-forecasting/380634#380634 particularly the General ARMAX model image. If you wish to post your data I will try and help further in suggesting an approach.. – IrishStat Dec 07 '18 at 12:43
  • Thanks for this info! I've added a link above to some data to hopefully give you a better idea of what I'm dealing with. It's just a subset, since I don't have permission to share the full/true data... – LexConstantine Dec 07 '18 at 13:45

1 Answers1

0

You might want to read http://www.autobox.com/pdfs/regvsbox-old.pdf and http://docplayer.net/12080848-Outliers-level-shifts-and-variance-changes-in-time-series.html and http://faculty.chicagobooth.edu/ruey.tsay/teaching/uts/lec10-08.pdf as time series brings special opportunities.

For a particular country ,,, for a particular type ticket ...over time ...you could follow Transfer function in forecasting models - interpretation and a basic primer on model identification here https://onlinecourses.science.psu.edu/stat510/node/75 and somewhat mode advanced ideas here https://autobox.com/pdfs/TSAY.pdf . you can safely ignore the suggestion on how to identify a tf as it doesn't work when you have pulses etc ).

With respect to your data privacy ... I would scale/code the data to mask it while hiding country and type of ticket and post it for the list and evaluate the submitted results. Otherwise you are left to your own limited resources ...with possible consequences trying to use software that may not be robust .

Hope this helps ... Finally if you want more reading ... https://stats.stackexchange.com/search?q=user%3A3382+transfer+function

IrishStat
  • 27,906
  • 5
  • 29
  • 55