3

I am trying to fit a non linear regression model on a set of data points which I know is incomplete. When visualizing the data, the relationship looks quite simple between my features and dependent variables (~3 degree polynomial). Within the range of the data, I am finding little out of sample difference between the predictive power of ANN, SVR, Boosted Trees etc.

However, should I expect to encounter points outside the range of my sample, which classes of models should I use for better performance? Intuitively it seems that Trees should be avoided entirely? SVR, forcing C to be low might be the best among bad choices? Are there any theoretical insights or best practices for this?

hjw
  • 141
  • 3
  • If you have domain knowledge about what you can expect then yes, you could choose a better option. If you have no knowledge about what to expect then it's anybodies guess. – user2974951 Dec 18 '18 at 08:18
  • Ensemble models (like Random Forests) will bind predictions to be in the training-set range of values. So yes, sometimes, this is something that you don't want. – daruma Oct 06 '21 at 23:57

0 Answers0