Suppose we have a good Machine Learning model, with good cross-validation and test score.
How can we estimate whether a newly input data instance belongs to the domain of data where model predictions are dependable?
To give an example: It is impossible to train a self-driving car with data from from every situation it will ever face. How can we identify situations where the model will not work well?
I can imagine simple methods to estimate if new data is similar to training data, e.g., based on nearest neighbor distance. However such a solution might not account for feature importance and hence not optimal.
I am looking for a more systematic discussion of solutions to this question.