Suppose I have some data which I have split into k-folds (where $k$ is less than the number of data points). I train the model on the training folds, and want to test on the remaining fold.
For k-folds, where $k < n$ (and so the validation fold has multiple data points), I believe the following is correct: for each fold I calculate $\sqrt{\frac{1}{m}\sum_{i=1}^m (\hat{y_i}-y_i)^2}$ where $m$ is the number of data points in the fold. I then average these different errors to get the error for that model. I can then repeat for different tuning parameter values to see which tuning parameter value gives me the lowest overall error.
My question is then what do you do when $k=n$, (LOOCV)? There are two different approaches I can think of.
Approach 1: The same as above. However, in this case as each testing set consists of only a single data point, $\sqrt{\frac{1}{m}\sum_{i=1}^m (\hat{y_i}-y_i)^2} = \sqrt{(\hat{y}-y)^2} = |\hat{y}-y|$. I can then average each of these. This would be identical to calculating the mean absolute error.
Approach 2: square these errors, sum the errors from the different validation sets, and then square root. This wouldn't necessarily be the same as the MAE.
Is either of these the correct approach? And if so, why is one used over the other?