0

Lets consider a simple regression problem in which we have only one real-valued feature and one real valued-target.

We try to fit the data using a polynomial function. We also try to use the given data to determine what order of the polynomial is the best (given the amount of data and amount of the noise in the data).

The standard way to go, is to run a cross validation (i.e. leave-one-out) and choose the order of the polynomial that provides the best (smallest) error on the cross validation.

However, I would like to make an estimation that accept a more complex model (a higher order of the polynomial) only if there is a strong statistical evidence of it.

In more details, I first assume that there is no dependency of the target on the feature (I call it a "constant model"). Then I calculate the leave-one-out cross validation (LOOCV) error of this "constant model" and compare it with the LOOCV error of the linear model. Now, if LOOCV error of the linear model is better (smaller) than the LOOCV error of the "constant model" I do not automatically replace the "constant model" by the linear model (as I would do it in a standard LOOCV). Instead, I first ask, how probable is for LOOCV error of a linear model to be better that LOOCV error of the "constant model" under the assumption that the "constant model" is correct (null hypothesis testing). Then I accept the linear model only if this probability is small enough (let's say 0.01).

Now I repeat the procedure but with the quadratic model. If LOOCV error of the quadratic model is smaller than LOOCV error of the linear model, I do not automatically accept the quadratic model. As before, I first check, how probable is for the quadratic model to beat the linear model under the assumption that the linear model is correct. And, as before, I accept the quadratic model only if this probability is small enough.

Is there a meaningful way to implement this idea? How can I calculate the above-mentioned probabilities?

ADDED

Maybe for the described approach LOOCV not even needed. For example, we can fit the data set with a linear function and then with a quadratic function. Obviously in-sample error of the quadratic function will be smaller. But we ask the question: Is quadratic error small enough to be very improbable under the assumption that the real dependency is linear.

Roman
  • 1,013
  • 2
  • 23
  • 38

2 Answers2

0

The straightforward approach would be to compute the Bayes factor (and possibly use prior beliefs to arrive at a posterior probability), as mentioned in my answer to the previous question.

There is a good paper by David Barber on this topic, for the classification case, and it is also discussed in his book "Bayesian Reasoning and Machine Learning".

Dikran Marsupial
  • 46,962
  • 5
  • 121
  • 178
0

Based on the procedure you described, it seems like you could run a forward stepwise regression on your nested models and also keep track of out-of-sample LOOCV errors. Then you can choose a reasonably simple model with an acceptable LOOCV error.

Ozan
  • 101
  • 2