I would like to know what is it used to parallelize ML models and to optimize them?
DEEP LEARNING
When using Deep Learning usually a GPU is used for training and inference (deployment). This is understandable as GPUs have thousands of cores and Deep learning uses tensors. So, GPUs are used to parallelize using CUDA. And in the deployment phase, TensorRT or TensorFlow-TRT are used in order to optimize the layer graph:
- Elimination of layers whose outputs are not used
- elimination of operations which are equivalent to no-op
- the fusion of convolution, bias and ReLU operations, etc).
But in the case of Machine Learning, what do we have to parallelize and optimise?
MACHINE LEARNING
CPUs do not have as many cores as GPUs but have a larger memory capacity. Machine learning models, usually are not as complex as Deep Learning ones.
Obviously, TensorRT nor TensorFlow-TRT will be used with SVM, RandomForest, etc; as there are no tensors. So, How can we optimize ML models? In scikit-learn there is "n-jobs" to parallelize. But, will it work when using GPUs? Or is it neccessary to usa CUDA? Or do ML algorithms need to be rewritten in order to parallelize (see: PSVM: Parallelizing Support Vector Machines on Distributed Computers).
Thank you