RNG, R, mclapply and cluster of computers

Question

I'm running a simulation on R and a cluster of computers and have the following problem. On each of X computers I run:

fxT2 <- function(i) runif(10)
nessay <- 100
c(mclapply(1:nessay, fxT2), recursive=TRUE)

There are 32 computers, each with 16 cores. However, around 2% of the random numbers are identical. What strategies would you adopt to avoid this?

I've been able to avoid this problem for fxT2 by setting a latency (i.e. delaying by a sec the time at which each job is send to each of the X computers). But it's seems very ad-hoc to fxt2.

The problem is that in reality fxT2 is a long task involving pseudo random numbers. At the end of the process, I expect to get X*nessay reproduction of the same statistical experiment, not nessay reproductions. How to make sure that this is indeed the case and is there a way to check this?.

Good question. Have a look at this [question](http://stats.stackexchange.com/questions/3532/random-numbers-and-the-multicore-package) on random numbers and the multicore package — csgillespie, Feb 17 '11 at 17:54
@CSgillepsie:> thanks for the pointer, but i'm not sure it's the same problem: the way i understand the question you pointed to, all the processes are spawned by mclapply. Here it's a bit different: on each of the machines, all the processes are spawned by mclapply, but this is not the case *across* machines. — user603, Feb 18 '11 at 10:09

score 6 · Accepted Answer · answered Feb 17 '11 at 18:27

6

The snow has explicit support to initialise the given number of RNG streams in a cluster computation.

It can employ one of two RNG implementations:

rsprng and
rlecuyer

Otherwise you have to do the coordination by hand.

answered Feb 17 '11 at 18:27

Dirk Eddelbuettel

8,362
2
28
43

score 3 · Answer 2 · answered Feb 17 '11 at 16:39

3

You need to use a RNG specifically designed for parallel computing. See the "Parallel computing: Random numbers" section of the High Performance Computing Task View.

answered Feb 17 '11 at 16:39

Joshua Ulrich

1,376
10
16

You also need to coordinate between the RNG streams. Snow does that, multicore may now. – Dirk Eddelbuettel Feb 17 '11 at 18:23

RNG, R, mclapply and cluster of computers

2 Answers2