2

What is the cost of nested cross-validation in terms of the number of times the algorithm needs to perform a fit-evaluate step?

Based on this description of the algorithm, I think the answer is:

$n \times k \times j \times m$

Where:

  1. $n$ is the number of models you are evaluating.
  2. $k$ is the number of folds in the outer cross validation.
  3. $j$ is the number of folds in the inner cross validation.
  4. $m$ is the number of combinations of hyperparameters to compare at each stage.

I am a little confused on this. According to this article and this one, they seem to indicate a cost of:

$n \times k^2$

I have no problem understanding that the $k^2$ is really just a special case of $j \times k$, where $j = k$, as I'm sure it often is. But what about the number of different combinations of hyperparameters that need to be evaluated for the inner cross validation, which I am calling $m$? Is there a reason that $m$ is being left out in the cost calculation of $n \times k^2$ e.g. $m$ is usually small, or did the authors just forget to add it in?

gunes
  • 49,700
  • 3
  • 39
  • 75
Colm Bhandal
  • 133
  • 4

1 Answers1

1

I'm afraid not all sources mean exactly the same with with nested CV. Your first link uses the outer loop for performance evaluation, as in sklearn documentation. The outer loop's mission is to provide an evaluation of your procedure. However, in the second and third links (which are very alike, unlike their lengths), the outer loop's mission is to select an algorithm, and inner loop's mission is to optimize each an every algorithm's hyper parameters. So, it's not meaningful to compare the three sources' formulas, especially if the definition of $n$ is not clear among them.

Conceptually, model selection and HPO are the same; which means, according to your definition, the total number of model configurations is $n \times m$. So, if you do a nested CV for both model selection and HPO, each with $k$ folds, your complexity will be $n\times m\times k^2$. A sensible way to decrease this complexity is to flatten all your models and HP configurations in one list and try them all in one CV loop (i.e. $n \times m$ models), which can return you the best model and its chosen configuration.

If you were to also add an cross-validated evaluation layer to your original CV configuration (instead of a single test set) as it's done in sklearn documentation or the first post, again with $k$ folds, the complexity would be $nmk^3$.

So, it really depends on your CV loop definition, which is as I mentioned change wrt the source you use.

gunes
  • 49,700
  • 3
  • 39
  • 75