I'm hoping to use machine learning to predict chemical properties of various molecules. Many chemistry machine learning research papers that I come across talk about model generalizability issues related to new molecules that are not "in-distribution" with training data.
To give an "out of distribution" example in layman's terms from an entirely different field, it's like training a self-driving car on a highway with well-marked lanes, and expecting that model to know what to do when you deploy it on a golf cart in the middle of a putting green.
This particular problem in chemistry is common because we run expensive experiments on molecules that we've never seen before -- we don't bother running experiments on molecules we already know well. However, a successful chemistry predictive model would be useful because it might help avoid costly experiments whose outcomes might be obvious to the model but not to the chemist.
In chemistry, or in any domain for that matter where the representativeness of data is much less obvious, what are some techniques to measure the degree of novelty of a new record (i.e., molecule) at interference time? For what it's worth, I'm not using simple models like linear or logistic regression, but random forests and deep learning techniques.