42

I realize that one uses set.seed() in R for pseudo-random number generation. I also realize that using the same number, like set.seed(123) insures you can reproduce results.

But what I don't get is what do the values themselves mean. I am playing with several functions, and some use set.seed(1) or set.seed(300) or set.seed(12345). What does that number mean (if anything)- and when should I use a different one.

Example, in a book I am working through- they use set.seed(12345) when creating a training set for decision trees. Then in another chapter, they are using set.seed(300) for creating a Random Forest.

Just don't get the number.

mylesg
  • 613
  • 1
  • 5
  • 6
  • 6
    does this help? http://stackoverflow.com/questions/14684437/what-does-the-integer-while-setting-the-seed-mean Also ?set.seed() within R provides pretty good information. – doug.numbers Feb 12 '14 at 02:31
  • 7
    The main point of using the seed is to be able to reproduce a particular sequence of 'random' numbers. Generally speaking, if you don't need to be able to do that, you *wouldn't* set the seed. The seed itself carries no inherent meaning except it's a way of telling the random number generator 'where to start'. You might think of it a bit like the relationship between a PIN number and your bank account. The PIN is associated with a long string of numbers (your account number), but it's not inherently an interpretable quantity (there is *an* interpretation, but in setting it, you ignore that). – Glen_b Feb 12 '14 at 03:38
  • 5
    For the record, 42 is always the right seed – Repmat Jun 11 '18 at 04:03
  • Answered here: [What exactly is a seed in a random number generator?](https://stats.stackexchange.com/questions/354373/what-exactly-is-a-seed-in-a-random-number-generator) – Ben Oct 16 '19 at 00:00
  • Just a comment: I recommend to set random generator only (i) to debug a script, to find some particular errors, etc. or (ii) to send/publish results so they can be checked. – AADF Oct 15 '19 at 23:01

3 Answers3

39

The seed number you choose is the starting point used in the generation of a sequence of random numbers, which is why (provided you use the same pseudo-random number generator) you'll obtain the same results given the same seed number. As far as your second question is concerned, this short snippet from the description of the equivalent functionality in Stata might be helpful:

We cannot emphasize this enough: Do not set the seed too often. To see why this is such a bad idea, consider the limiting case: You set the seed, draw one pseudorandom number, reset the seed, draw again, and so continue. The pseudorandom numbers you obtain will be nothing more than the seeds you run through a mathematical function. The results you obtain will not pass for random unless the seeds you choose pass for random. If you already had such numbers, why are you even bothering to use the pseudorandom-number generator?

http://www.stata.com/manuals13/rsetseed.pdf

crcvd
  • 587
  • 5
  • 3
  • 17
    Who knew Stata had such interesting documentation: "Others try to make up a random number, figuring if they include enough digits, the result just has to be random. This is a variation on the five-second rule for dropped food, and we admit to using both of these rules" – degenerate hessian Feb 23 '17 at 18:30
  • I dunno. I know it's kind of a joke, but setting the seed "too often" is not a big deal -- certainly nothing that one "cannot emphasize enough". Setting the seed just means you get the same sequence. The Monte Carlo error from that sequence is a function of the discrepancy (i.e., gappyness), not whether it is "really random". A fixed pseudorandom sequence is not any more or less problematic than the fixed interval sequence 1/n, 2/n, 3/n, ..., 1. – Robert Dodier Jan 07 '21 at 04:35
  • It's not clear that "The results you obtain will not pass for random unless the seeds you choose pass for random" is true. For seeds 1, 2, 3, ... you'll get bits sprayed all over the range [0, 2^whatever). Whether such a sequence passes typical RNG tests is an empirical question; it is certainly not true or false on the face of it. – Robert Dodier Jan 07 '21 at 04:39
4

In short, the numbers themselves don't really mean anything! If you are looking at someone else's code (like in the two examples you gave above), the numbers don't alter the functionality of the function; neither are there "good" numbers for specific functions. It's just down to the authors' choice.

Further, if you are only ever setting the seed once in your code, then you can kind of choose any number you like. The only thing you need to be a bit careful of is that, if you interface with any other functions that also use random numbers, then it's good to choose a non-obvious seed (so it's less likely for you both to use the same seed).

However, as Corcovado really nicely points out, for some applications, you need to be really careful about the choice you make. If mathematically you require a number of pseudo-randomly-generated numbers, then there can't be a pattern to the numbers you choose.

KRS
  • 49
  • 1
  • 1
1

The set.seed()function in R takes an (arbitrary) integer argument. So we can take any argument, say, 1 or 123 or 300 or 12345 to get the reproducible random numbers.

Also, in theTeachingDemos package, the char2seed function allows user to set the seed based on a character string.

Dr Nisha Arora
  • 884
  • 1
  • 8
  • 21