16

It seems like everyone just uses set.seed(123) or set.seed(1234) when they are doing random sampling. If so many people use just a select few integers for set.seed(), doesn't that mean that everyone is drawing from the same state of the random number generator and therefore all results are not a true random sample?

Stephan Kolassa
  • 95,027
  • 13
  • 197
  • 357
conv3d
  • 626
  • 5
  • 12
  • 3
    People use a specific seed for didactical purposes for themselves and others to reproduce _the example_. And they set random seed (and would prefer Mersenne twister) when doing real random sampling to solve their _real tasks_ which they will report. – ttnphns Apr 07 '16 at 08:34
  • 4
    If you have any evidence for this claim you should produce it. The most common seed I'm aware of is the current time. – user207421 Apr 07 '16 at 08:35
  • 3
    Very nearly the same question is addressed at http://stats.stackexchange.com/questions/80407 . It asks whether repeatedly using the same seed "creates bias." – whuber Apr 07 '16 at 17:05

1 Answers1

13

An interesting question, though I don't know whether it's answerable here at CV. A few thoughts:

  • If you do an analysis involving random sampling, it's always a good idea to re-run it with different seeds, just to assess whether your results are sensitive to the choice of seed. If your results vary "much", you should revisit your analysis (and/or your code).

    If everyone did this, I wouldn't worry overly about the aggregate effect of everyone in the end using the same seed, because after this sanity check, everyone's results don't depend on it too much any more.

  • Given that random numbers are used in many, many, many different contexts, with different models used in different applications, transforming the pseudorandom numbers in different orders and in different ways, I wouldn't worry too much about a possible systematic effect overall. Even if, yes, such an effect could in theory be visible on an aggregate level even when it is not visible to each separate researcher as per the previous bullet point.

  • Finally, I personally never use 123 or 1234 as seeds. I use 1 ;-) Or the year. Or the date. I really don't think 123 or 1234 are all that prevalent as seeds. You could of course set up a poll somewhere.

Stephan Kolassa
  • 95,027
  • 13
  • 197
  • 357
  • 8
    +1, for "re-seeding" tests... Personally I always use 42... meaning of life and all. – Repmat Apr 07 '16 at 07:54
  • 6
    A friend of mine had once a problem with publishing a simulation paper because he used the `666` seed and it was considered "inappropriate" by the reviewer. It was a complicated simulation study, so re-running it was a time-consuming process :) – Tim Apr 07 '16 at 08:24
  • 4
    +1. The main point about setting a seed is, I take it, to ensure reproducibility. If the numbers really do "depend on" the seed, the sample was too small or the analysis flawed, assuming (which we can not) use of state-of-the-art random number generators – Nick Cox Apr 07 '16 at 08:34
  • 1
    @Tim: I wish the only thing my reviewers required of me was rerunning the analysis with a different seed, even though this does take computing time. Still better than requiring me to do actual *work*. (Uwe's maxim: "Computing is cheap, and thinking hurts.") – Stephan Kolassa Apr 07 '16 at 08:36
  • 2
    See http://stackoverflow.com/questions/34315874/stata-access-element-of-matrix-as-scalar-or-macro for an example of "trying out different seeds to see which one gives the best results" (the OP's exact words). The OP didn't budge even given critical comments from two people, myself and @SteveSamuels. I leave the discussion to speak for itself. – Nick Cox Apr 07 '16 at 08:44
  • +0.7124225.... I agree with the spirit of the answer but a habitual user of `set.seed(123)` I think it is unjustifiably vilified therefore I deduct `set.seed(123); runif(1)` points from my original +1 upvote. – usεr11852 Mar 21 '18 at 22:10