When to stop training your machine learning regression model

Question

Maybe this question is a duplicate but I struggle to find the answer. I train my models: random forest and linear regression at the same time and I wonder when I should stop the training? What is the heuristics for determination of the proper result for mse or mae? I know there some techniques like validation curve and cross validation but they tell when there is no overfitting and it doesn't imply that the maximum quality of the model has been achieved

Haitao Du · Accepted Answer · 2020-04-14T17:58:46.063

First I would recommend you to read this post.

How to know that your machine learning problem is hopeless?

Now answer to your question:

It really depends on your data and the business needs. Suppose your data is really good and we can use a simple linear regression to achieve >90% accuracy, and this satisfy the business needs. Then we can stop and put it into production. However, if it is a mission critical task and business needs >99% accuracy, then we still need to tune the model.

On the other hand, if the data is not so good, for example, the features have very weak "correlations" to the prediction target. May be the best model will just have 30% accuracy. And, depending on the task, business may say that is OK. (Because people may know this problem is hard, 30% accuracy is much better than random, that is good enough.)

BTW, I am talking about the classification accuracy, but if it is a regression, it is the same thing, people may need some specific numbers for R^2 or RMSE.

When to stop training your machine learning regression model

1 Answers1