3

I am attempting to predict demand for our service, both quantity but maybe more important, location (hotspots).

I am by no means an experienced statistician, so I need some help :)

I have all the historic data for our service, date, latitude and longitude.

As far as I understand, the first thing to do is not to deal with the latitude and longitude: somehow they need to be converted into a single dimension right?

After that what type of analysis should be done to the data?

I think dealing with the date directly might also be the wrong way to go. My idea here is to deal only with week days, so I can predict the demand for a type of day (any Tuesday) instead of a specific Tuesday.

I am looking for some guidance as to how to achieve this. I am a good programmer, but I do need some help finding the right way.

Nick Cox
  • 48,377
  • 8
  • 110
  • 156
Zebs
  • 251
  • 2
  • 4
  • 1
    This is very broad. The time series aspect alone is broad enough. Focusing on day of the week alone is going to be a good idea if variations within weeks are the main component of variation: we can't say, but sounds unlikely. If you don't get a good answer, you may need to make this much more specific. Meanwhile, there are many, many questions on this site about prediction: did you look at any? – Nick Cox Aug 22 '13 at 08:02
  • 1
    [forecasting] is a good tag to search. It does catch what you want more precisely than [prediction]. – Nick Cox Aug 22 '13 at 08:22

2 Answers2

3

First of all, you need a programming language for predictive modeling. I like the caret package for R a lot, but the scikits-learn project for python is also excellent if you are a python person.

Your data has 2 components: geography and time. Geography is easier, so lets start there. Most linear models are going to fail to find hotspots, particularly if those hotspots are defined by the interaction of 2 variables (latitude and longitude). Lets pretend your data are constant, and hotspots don't change over time. Many non-linear model will serve you well, in particular decision-tree based models (such as a single decision tree, a random forests, or a boosted forest). Decision trees are able to identify regional hotspots.

Time is a little trickier. Think about which components of time primarily effect demand for your product. Hour-of-day? Day-of-week? Month-of-year? Holidays? Create dummy variables to capture each of these effects, and include them in your model.

If you would like some example R code for any of the above, I'd be happy to provide it.

Zach
  • 22,308
  • 18
  • 114
  • 158
  • Hi Zach, For my project I am also trying to do a similar thing. If its possible could you provide a R code example for this problem? Thanks a bunch for any help. – user1021713 May 16 '19 at 08:22
  • do you have an example dataset? – Zach May 22 '19 at 14:20
  • Yeah Date State Districts Latitude Longitude Max. Temperature Min. Temperature Humidity Pressure Hits States 18-Jul Andhra Pradesh Chittoor 13.2172 79.1003 36 26 58 1004 18-Nov Chattisgarh Janjgir-champa 21.9706 82.4753 34 16 61 1012 18-Sep Gujarat Amreli 21.6032 71.2221 35 25 81 1008 18-Aug Karnataka Chikkaballapura 13.4324 77.728 29 19 76 1012 18-Nov Madhya Pradesh Burhanpur 21.3194 76.2224 34 14 53 1012.6 19-Mar Maharastra Nasik 19.9975 73.7898 39 16 64 1012 19-Jan Tamil Nadu Pudukkottai 10.3797 78.8208 34 18 68 1014 19-Apr Rajasthan Udaipur 24.5854 73.7125 42 22 36 1009 – user1021713 May 23 '19 at 04:47
2

I would recommend first looking at your data within some visualisation package - eg Tableau. This can cover map data easily too. "Heatmaps" come to mind for looking at hotspots!

seanv507
  • 4,305
  • 16
  • 25