5

For a clickstream simulation I require to generate sequences of randomly distributed timestamps.

Each sequence should:

  • start and end between sim_start and sim_end,
  • have hit_count number of timestamps
Tim
  • 108,699
  • 20
  • 212
  • 390
Oren Bochman
  • 163
  • 1
  • 9
  • In a solution at http://stats.stackexchange.com/questions/129322, I generated a process of random independent events with `cumsum(round(rexp(n.events, arrival.rate), 2))`. To achieve a specific number of such events, you could generate hit_count+1 of them, rescale into the intended time interval, and drop the first or last. – whuber Feb 03 '17 at 14:42
  • "random" is ambiguous. Do you want uniform or gaussian distribution, for example? – Carl Witthoft Feb 03 '17 at 15:50

1 Answers1

14

Computers have different ways of storing time data. For example, R uses date-time classes POSIXlt and POSIXct. From the documentation

Class "POSIXct" represents the (signed) number of seconds since the beginning of 1970 (in the UTC time zone) as a numeric vector.

So time is stored as a number of seconds

Sys.time()
## [1] "2017-02-03 10:34:35 CET"
as.numeric(Sys.time())
## [1] 1486114478

this means that if you want to sample timestamps, then you simply need to sample values from $0$ to $k$ (maximal number of seconds from the origin of choice), and then transform them to timestamps, e.g.

u <- runif(10, 0, 60) # "noise" to add or subtract from some timepoint
as.POSIXlt(u, origin = "2017-02-03 08:00:00") # sample 60 seconds starting from this origin (i.e. time 0)from this origin (i.e. time 0)
## [1] "2017-02-03 09:00:44 CET" "2017-02-03 09:00:30 CET" "2017-02-03 09:00:06 CET" "2017-02-03 09:00:12 CET" "2017-02-03 09:00:36 CET"
## [6] "2017-02-03 09:00:16 CET" "2017-02-03 09:00:18 CET" "2017-02-03 09:00:34 CET" "2017-02-03 09:00:22 CET" "2017-02-03 09:00:35 CET"

Outside of R you also can follow such procedure by sampling some values and adding (or subtracting) them from some time-object like =NOW() in Excel or systime in databases etc.

Notice that this procedure enables you to sample from non-uniformly distributed time if you sample from different distribution, for example, normal distribution as in the example below.

hist(as.POSIXlt("2017-02-03 08:00:00") + rnorm(1e6, 0, 60*60), 100)

Randomly generated timestamps

Tim
  • 108,699
  • 20
  • 212
  • 390
  • This procedure doesn't look quite right, because (a) the sample is discrete yet (b) does not allow for ties. Real timestamps wouldn't behave like that: they would all be distinct, if represented with sufficient precision; and if coarsely rounded, there would be some ties. – whuber Feb 03 '17 at 14:38
  • @whuber notice that I use discrete sampling just as an example, in the second example I use continuous sampling. But I'll edit this for clarity. – Tim Feb 03 '17 at 14:45
  • I did notice that--but you use discrete sampling *without replacement*. – whuber Feb 03 '17 at 14:46
  • 1
    @whuber I didn't meant it, actually this was by mistake. Thanks for watching and for noticing me! – Tim Feb 03 '17 at 14:51