0

A lot of people use random.seed() at the beginning of their python code in training a machine learning model. As I understand they want to control the randomness therefore they can compare different hyper-parameters, but as demonstrated in the following piece of python code, the random int number is controlled only when you set seed every time before you use it. Why do they bother to set seed if it did not control the randomness in the training process?

>>> random.seed(5)
>>> random.randint(1,10)
10
>>> random.randint(1,10)
5
>>> random.seed(5);random.randint(1,10)
10
>>> random.seed(5);random.randint(1,10)
10

Example of set seed code

def set_seed(seed: int = 42):
    random.seed(seed)
    np.random.seed(seed)
    os.environ["PYTHONHASHSEED"] = str(seed)
    torch.manual_seed(seed)
    torch.cuda.manual_seed(seed)  # type: ignore
    torch.backends.cudnn.deterministic = True  # type: ignore
    torch.backends.cudnn.benchmark = True  # type: ignore

The set_seed has been called only once.

mdewey
  • 16,541
  • 22
  • 30
  • 57

1 Answers1

3

The answer is simple: to be able to reproduce the results. Sometimes, setting one seed is not enough, e.g. when calling algorithms using their own random number generator (e.g. tensorflow in CPU mode - GPU is another story here).

Michael M
  • 10,553
  • 5
  • 27
  • 43