What is it used to parallellize machine learning (RandoForest, SVM, etc) training and to accelerate inference?

Question

I would like to know what is it used to parallelize ML models and to optimize them?

DEEP LEARNING

When using Deep Learning usually a GPU is used for training and inference (deployment). This is understandable as GPUs have thousands of cores and Deep learning uses tensors. So, GPUs are used to parallelize using CUDA. And in the deployment phase, TensorRT or TensorFlow-TRT are used in order to optimize the layer graph:

Elimination of layers whose outputs are not used
elimination of operations which are equivalent to no-op
the fusion of convolution, bias and ReLU operations, etc).

But in the case of Machine Learning, what do we have to parallelize and optimise?

MACHINE LEARNING

CPUs do not have as many cores as GPUs but have a larger memory capacity. Machine learning models, usually are not as complex as Deep Learning ones.

Obviously, TensorRT nor TensorFlow-TRT will be used with SVM, RandomForest, etc; as there are no tensors. So, How can we optimize ML models? In scikit-learn there is "n-jobs" to parallelize. But, will it work when using GPUs? Or is it neccessary to usa CUDA? Or do ML algorithms need to be rewritten in order to parallelize (see: PSVM: Parallelizing Support Vector Machines on Distributed Computers).

Thank you

score 4 · Accepted Answer · answered Jun 02 '20 at 14:09

Machine learning models, with the exception of neural networks, generally do not use the GPU. However, some algorithms can utilize multiple cores/threads. This depends on the algorithm. If the algorithm can easily be broken into multiple independent processes which don't depend on each other's output, then it can be run in parallel. For example, with a random forest, you can train multiple trees at the same time on different threads. For other algorithms, like SVM, it is more difficult or impossible without changing how the algorithm works (see PSVM). Scikit-learn's support vector classifier does not support multi-threading, as an example. However, if you're training multiple models, such as multiple SVMs, you can always train them in parallel.

But in the case of deployment? when the model is already trained? Do we just put the model in a server and that is all? No need to accelerate the inference for the many people/machines that might use it? — Aizzaac, Jun 02 '20 at 15:32

score 2 · Answer 2 · edited Jun 11 '20 at 14:32

I find it hard to tell, what exactly the question is. "Deep Learning" and "Machine Learning" are not disjunct terms. Deep Learning is a special case of Neural Network learning which is a special case of Machine Learning. As Ryan Volpi explained, different parts of Machine Learning lend differently to parallelisation. Neural Networks often perform well on GPUs and with them Deep Learning.

A different question is the meaning of n-jobs in Python's scikit-learn and whether that addresses GPUs. We find the answer in the scikit-learn FAQ on https://scikit-learn.org/dev/faq.html :

FAQ: Will you add GPU support?

Answer: No, or at least not in the near future. The main reason is that GPU support will introduce many software dependencies and introduce platform specific issues. scikit-learn is designed to be easy to install on a wide variety of platforms. Outside of neural networks, GPUs don’t play a large role in machine learning today, and much larger gains in speed can often be achieved by a careful choice of algorithms.

And then there is the question of "Is is necessary to use CUDA?". CUDA is a parallel computing platform and programming model developed by NVIDIA for general computing on GPUs. So at this time CUDA is a requirement to use NVIDIA GPUs for machine learning/neural networks but often you do not have to use CUDA yourself but rather software that makes use of it for you.

In deployment, the DeepLearning model needs to be quantised and compiled to accelerate inference when using GPUs and TPUs. Does something similar exist when using MachineLearning? Or is a server enough to manage requests of people/machine who need to use the model? — Aizzaac, Jun 02 '20 at 15:44
Whether machine learning models are deployed in interpreted or compiled form is an individual decision made by those who deploy it. In many cases interpreted solutions will be performant enough. — Bernhard, Jun 02 '20 at 18:07

What is it used to parallellize machine learning (RandoForest, SVM, etc) training and to accelerate inference?

2 Answers2