The proof of your quotation is very straightforward, and the result does not depend on data dimension.
Random search is used in a scenario where we don't much about the function $f$ we seek to optimize. The function is allowed to be very general. In particular, we admit non-convex $f$ and functions $f$ for which the gradient is unavailable. However, we do know two things which allow us to make progress in optimizing $f$:
- We define a set $\mathcal{X}$ that we're searching within. Typically, this set is just some rectangle which we guess contains the optimal point somewhere within it. (Of course, if the cardinality of $\mathcal{X}$ is small, we can just try all $x\in\mathcal{X}$ and we'll know for certain the best value. But if $\mathcal{X}\subset\mathbb{R}^d$, it's impossible to enumerate all $x$, so we'll need a different approach.)
- We can query the function. That is, for some $x \in \mathcal{X}$, we can obtain the value $f(x)$.
Now, it's reasonable to ask "If I make $n$ queries of $f$ using $n$ randomly selected points in $\mathcal{X}$, what's the probability $p$ that one or more of my queries is among the largest $100(1-q)$ percent of all function values $f(x)$ such that $x\in\mathcal{X}$?" We can answer this definitively using some results from probability theory.
So we choose the the probability $p$ that you want your results to lie in some quantile $q$, and we can show that $n \approx 60$, as claimed in the quotation. This result does not depend on the dimension of $\mathcal{X}$.
Suppose your quantile is $q = 0.95$ and you want a $p=0.95$ probability that the model results are in top $100(1-q)=5$ percent of all hyper-parameter tuples. The probability that all $n$ attempted tuples are not in that window is $q^n = 0.95^n$ (because they are chosen independently at random from the same distribution), so the probability that at least one tuple is in that region is $1 - 0.95^n$. Putting it all together, we have
$$
1 - q^n \ge p \implies n \ge \frac{\log(1 - p)}{\log(q)}
$$
which in our specific case yields $n \ge 59$, whence your author rounded up to 60.
We can get some intuition about how random search works if we consider a special dartboard. I divide the dartboard into 100 equal-sized wedges, and number each wedge with a different integer number. Because each wedge has a unique integer, we know that exactly one of the wedges is labeled with the largest number. Likewise, the five largest numbers are in the top 5% by definition.
In this game, I'm throwing darts at a dartboard in a dark room, so I can't see the board. This means that when I throw a dart at the dartboard, it will land uniformly at random on the board. This is much like random search: we know there is a wedge labeled with the largest value somewhere on the board, but we can't see where it is because we're in the dark.
Now let's consider several games.
I throw $n=1$ dart at the board. The probability that I hit a wedge among the largest 5% of wedges is $\frac{5}{100}=1-0.95^1=0.05$.
I throw $n=20$ darts. The probability that one or more darts hits a wedge among the largest 5% of wedges is the same as the probability that all darts are not among the largest 5%: $1-0.95^{20}\approx 0.64151.$
I throw $n=58$ darts. The probability that one or more darts hits a wedge among the largest 5% of wedges is the same as the probability that all darts are not among the largest 5%: $1-0.95^{58}\approx 0.9489531.$
I throw $n=59$ darts. The probability that one or more darts hits a wedge among the largest 5% of wedges is the same as the probability that all darts are not among the largest 5%: $1-0.95^{59}\approx 0.9515.$
I throw $n=60$ darts. The probability that one or more darts hits a wedge among the largest 5% of wedges is the same as the probability that all darts are not among the largest 5%: $1-0.95^{60}\approx 0.95393.$