Can I use just one GPU to train a model and predict images at the same time? I want to host a website for image predictions. So using GPU for prediction is persistent. At the same time, I may use it to train some models. Is that doable? Or I need two GPUs to do these two tasks. Thank you in advance.
-
You could delete it if you think that is off-topic question. Thanks – Jane Jun 07 '18 at 22:24
1 Answers
This question isn't really suited for here as it mainly deals with architecture. However, the answer is yes, as long as your GPU has enough memory to host all the models. As an example, with an NVIDIA gpu you can instantiate individual tensorflow sessions for each model, and by limiting each session's resource use, they will all run on the same GPU. You can access them simultaneously as long as you're using multiple threads. Although if you want to retrain your model, you'd have to host a copy of it, save the weights, and then reload them in your prediction session.
On the other hand, you should take a look at services like AWS Sagemaker, which take care of autoscaling for both prediction and training, and seamlessly (re)train the model without interrupting prediction. This will invariably use a second GPU instance for training but it gets rid of headaches related to prediction downtime.

- 13,097
- 2
- 25
- 49
-
Thank you for your quick reply. How about if I don't use tensorflow session. For example, if I use either darknet or detectron, they are not written in tensorflow. Could I still run training and testing at the same time? My question is that your solution is framework related or that is independent. Thank you so much. – Jane Jun 07 '18 at 18:02
-
Do you mean the deep learning framework I used has to support parallel working mode, like sessions in tensorflow? How about two separate Applications? Could I still run them on the same GPU? – Jane Jun 07 '18 at 18:59
-
@Jane: While I'm not intimately familiar with it, if these packages use an NVIDIA GPU then they are somehow interfacing with CUDA, which should support sessionizing and resource management. – Alex R. Jun 07 '18 at 20:19
-
Thank you. I found this post (https://stackoverflow.com/questions/31643570/running-more-than-one-cuda-applications-on-one-gpu) . If my understanding is correct, it still does it sequentially, which means the prediction slowdown may happen when the prediction request sends in during a training. Or there is a way to set priority in cuda contexts. – Jane Jun 07 '18 at 21:04