0

I want to programatically calculate at which particular hours of a day the probability of a webpage hits(accessed) are high.

Which statistic formula should I use to calculate the peak hours of the web page, if I have already the below data about a page .

e.g. for page xyz, at left I have hours and at right I have hits , For different pages the hits are different.

Page xyz hits count data:

hr=hits
1=0
2=0
3=0
4=0
5=14
6=0
7=0
8=5
9=5
10=8
11=10
12=10
13=12
14=7
15=5
16=5
17=3
18=0
19=0
20=0
21=0
22=0
23=0
24=0
user3368626
  • 103
  • 2
  • The data already give you the peak hours. What exactly are you looking for that you don't already have? – David Marx Mar 07 '14 at 07:25
  • What I want to do is to calculate the threshold value from this data. and if for each particular hour the number of hits exceed from this threshold value then that I will declare that hour as peak hour. So how to calculate the threshold value. Currently I am calculating the threshold value as: total number of hits/24 hrs. e.g. if total hits are 100 then 100/24 =4.16 So for any hour if hits are more that 4.16 then that hour is declared as peak hour otherwise not peak hour. I am computer student and weak in statistic. If there is better way to do it statistically then please guide me. Thanks. – user3368626 Mar 07 '14 at 07:45
  • Which ones do you consider peak hours in this example? (5, 11, 12 and 13?) – Matt Feb 09 '15 at 10:13
  • This is a Poisson process. You count the number of visits in a given time interval (per hour). See the following link on [Why is the Poisson distribution chosen to model arrival processes in Queueing theory problems?](http://stats.stackexchange.com/questions/18821/why-is-the-poisson-distribution-chosen-to-model-arrival-processes-in-queueing-th) at first instance. – usεr11852 Nov 01 '15 at 08:55

1 Answers1

0

You are looking for a clustering algorithm in one dimension. If i understand correctly you would like to separate peak hours from non peak hours automatically. First try reasoning on your distribution. Create an histogram where on the x axis are bins of quantized occurrences e.g. 0-5, 6-10, and so on, and on the y axis the number of occurences for each bean. Then, you could separate the two classes in many ways, the easiest, would be to consider as separation the histogram minimum between the two peaks. Some other methods you can try are based on probability density or on k means clustering.

natbusa
  • 116
  • 3