10

I'm running a simulation on R and a cluster of computers and have the following problem. On each of X computers I run:

fxT2 <- function(i) runif(10)
nessay <- 100
c(mclapply(1:nessay, fxT2), recursive=TRUE)

There are 32 computers, each with 16 cores. However, around 2% of the random numbers are identical. What strategies would you adopt to avoid this?

I've been able to avoid this problem for fxT2 by setting a latency (i.e. delaying by a sec the time at which each job is send to each of the X computers). But it's seems very ad-hoc to fxt2.

The problem is that in reality fxT2 is a long task involving pseudo random numbers. At the end of the process, I expect to get X*nessay reproduction of the same statistical experiment, not nessay reproductions. How to make sure that this is indeed the case and is there a way to check this?.

csgillespie
  • 11,849
  • 9
  • 56
  • 85
user603
  • 21,225
  • 3
  • 71
  • 135
  • Good question. Have a look at this [question](http://stats.stackexchange.com/questions/3532/random-numbers-and-the-multicore-package) on random numbers and the multicore package – csgillespie Feb 17 '11 at 17:54
  • @CSgillepsie:> thanks for the pointer, but i'm not sure it's the same problem: the way i understand the question you pointed to, all the processes are spawned by mclapply. Here it's a bit different: on each of the machines, all the processes are spawned by mclapply, but this is not the case *across* machines. – user603 Feb 18 '11 at 10:09

2 Answers2

6

The snow has explicit support to initialise the given number of RNG streams in a cluster computation.

It can employ one of two RNG implementations:

Otherwise you have to do the coordination by hand.

Dirk Eddelbuettel
  • 8,362
  • 2
  • 28
  • 43
3

You need to use a RNG specifically designed for parallel computing. See the "Parallel computing: Random numbers" section of the High Performance Computing Task View.

Joshua Ulrich
  • 1,376
  • 10
  • 16