Machine learning models are fit to a response variable within a given range. This leads to weak and sometimes disastrous performance when it comes to instances with an actual response variable outside that range.
When the underlying mechanism (physics-based formula) is known, one gets better performance for the ML model if it incorporates such formula as a descriptor (pointed out by this answer). But there are times when we don't have the luxury of knowing the underlying mechanism.
There are also some examples of how certain models work poorly when it comes to extrapolation. (Here is a blog post comparing some models, and here is one sitting on SE archive of unanswered favorites).
So the question is:
1- Model selection: Are there established models that show less vulnerability to the extrapolation problem? (for example can it be that Neural Net models are more potent when it comes to extrapolation compared to regression-based models)
2- Diagnoses: What are (if any) performance metrics that are specifically designed to characterize extrapolation capability of a model? One obvious way would be to just test the model on instances out of range and report the error, which is not systematic neither statistically sound.
3- improvement: Besides the obvious (expanding the range of the training set), are there ways to improve the extrapolation performance of a model? Biased sampling in the training set, tweaking the loss function and increasing the penalty for instances with extreme response could potentially help. Are there systematic methods or published articles that provide guidance on that?