I'm currently working through a machine learning textbook and just read a bit about k-fold cross validation, and I am wondering the following. I want to estimate a parameter, e.g. a penalty parameter for a penalized likelihood method. In order to do this, I can do two different things:
I sample the training data so that I get $k$ equally large folds, and for each fold I use the other folds as training data to get estimates for $y$ and I compare these estimates with the actual $y$ from the fold in question. This, I do for every interesting choice of my parameter, and choose the parameter which has the least error, averaged over all folds and all members of each fold.
I sample the training data so I get 2 equally large sets, one of which I use as training data to predict the error of the other set. For every interesting lambda, I note the average error. Then, I re-sample the data so I get 2 (different) equally large sets, where I repeat the above procedure. I sample $k$ times in total, and average over these to get an estimate to the best parameter.
The second approach looks rather naive, and I am wondering if there is something wrong with it. Are there reasons, generally speaking, why one would prefer method 1 over method 2? Are there computational reasons, or even statistical ones?