Depending on the dataset the examine they need at most 32 or 64 trials; for their experiments they tried 256 (pp. 291, line #6).
I general this is a slightly "how long is a piece of string" question unless you use some particular pseudo-random search technique like Simulated Annealing (SA) or Bayesian Optimisation (BO) (ie. Gaussian Process Regression for Optimisation) for your search in the hyper-parameter space. Difference between SA and BO are briefly discussed here. The authors address this issue by explicitly using the latter (Gaussian Process Regression; see bottom paragraphs of pp. 291 & 292).
In the second half of their paper they also check on quasi-random sampling approaches (simply speaking: a random search that ensure some uniformity of the samples along the sampling space) and they find it is also better than grid-search too. The main reason that "random search is better than grid search" can be seen in the schematic of shown in Fig. 1; in short random search does not lose (a lot of) efficiency by to focusing on irrelevant features. It will allow a more thorough investigation of the feature space of the more relevant features while not losing information from the irrelevant ones.