The answer to all of your questions is no.
Question 1
Supervised learning is often formulated as an optimization problem, but this is not always the case. In addition to the counterexamples Sycorax mentioned, there's an entire world of Bayesian models where learning doesn't necessarily involve optimization. Here, the goal is to estimate a probability distribution over parameters or function outputs, and this involves integrating over all possibilities rather than optimizing.
Question 2
One can define equivalence classes of methods any way they like. But, I'd argue that the set of all methods that optimize an objective function is not a very useful categorization, and neglects many important differences. Among them...
Neural nets are simply a class of functions. They can be used for supervised learning, but also for other purposes. In contrast, SVMs and random forests define both a class of functions (i.e. hypothesis space) and a learning algorithm (which is a procedure that maps datasets to functions in the hypothesis space). The SVM and random forest learning algorithms are specifically formulated in a supervised learning context. One can define variants of these methods that work in other contexts, but can't strictly call these variants 'SVMs' or 'random forests'. From this perspective, neural nets are not in the same category as SVMs and random forests.
In the context of supervised learning, neural nets, random forests, and SVMs have been widely observed to perform differently on different datasets. This is a consequence of different inductive biases. However, inductive bias also depends on many specific choices for each method (e.g. data preprocessing, network architecture, learning algorithm, kernel choice, hyperparameter optimization procedure, etc.).
Additionally, there are important differences in the practical implementation of these methods (e.g. regarding learning/optimization and computational requirements).
Question 3
It's true that feedforward neural nets are universal function approximators (see here, but be aware that this has a very specific meaning, and there are caveats). It's also true that recurrent neural nets are Turing complete (see here). First, note that these results apply to general classes of neural nets; a given instantiation or particular subclass may not have these properties. Second, these results don't say anything about learning from data. And, third, they're of limited utility in an applied setting, as Sycorax mentioned.
Connection between neural nets and random forests
Various papers have explored the connections between neural nets and random forests or decision trees. For example, see Welbl (2014). Casting Random Forests as Artificial Neural Networks (and Profiting from It). Given a trained random forest, one can explicitly construct a neural net that implements the same function. I don't know whether or not there are papers exploring the reverse. But, one could make the following trivial argument: The class of decision trees of unbounded depth can represent any function by effectively acting as an infinite lookup table. The same property extends to random forests, since they're composed of decision trees. So, given any neural net, there exists a random forest that implements the same function. Of course, this isn't particularly interesting from a practical perspective.
Connection between neural nets and SVMs
There's a well known equivalence between kernel machines and feedforward neural nets with a single (nonlinear) hidden layer and linear output. The universal approximation theorem applies to this class of networks. So, assuming we can use any kernel function, and restricting ourselves to valid conditions for the UAT, a kernel machine can approximate any function a deeper network can. But, note that this doesn't imply that a deep net and kernel machine would produce the same output when trained on a finite dataset.