I had the following question relating to "Random Search Methods" in Optimization and Machine Learning - in short, are there theoretical results which show the obvious idea: Why are "Random Search Methods" slow and inefficient, and are not favored for estimating the parameters of statistical models compared to algorithms like Gradient Descent?
Suppose you have a function "f" : f(x, y, z). You are trying to optimize this function "f".
To illustrate this question, consider the following: For argument sake - let's say that we want to use "random search" to randomly query "f" at different points.
A) My (naive understanding) of "random search" is follows : we randomly query "f" at f(x=a, y = b, z = c) and then we record the value of "f". We repeat this process 1000 times and record the combination of "x,y,z" that results in the smallest value of "f".
B) However, it seems that there is an alternate way to do this: https://en.wikipedia.org/wiki/N-sphere#Generating_random_points . If I understand this correctly, Marsaglia showed an alternate way to generate random points. If the function "f" is in 3 dimensions, generate 3 random numbers between 0 and 1. Take the square root of the sum squares of these 3 numbers: call this "r". Then multiply a vector of these 3 random numbers by "r". This vector is the first point you will evaluate the function "f" at - now repeat this 1000 times, and choose the combination which results in the smallest value of "f". Note: apparently this method scales poorly when "f" is in many dimensions.
Question 1: I am a bit confused - when we talk about "random search", are we referring to to "A)" or are we referring to "B)"?
Question 2: I was always confused whether there existed any theoretical results about the "convergence of random search methods". Are there any mathematical theorems that state the obvious about "random search" - that if your dataset has too many rows and too many columns, statistically speaking, random search will become highly ineffective at optimizing a loss function?
Is there some math equations that statistically links the number of iterations required for a given number of rows/columns, in order to produce a certain error bound on a function (e.g. loss function) of a certain complexity? Are there some results that suggest "random search" will converge after a given number of iterations?
Thus, how can we answer the obvious question: Why do modern statistical models not use "Random Search Methods" for optimizing their loss functions and parameter estimation?
Thanks!
References