In "Cross-Validation Methods. Journal of mathematical psychology, Vol. 44, No. 1. (March 2000), pp. 108-132", Professor Browne pointed out that single sample cross-validation index and the Akaike information criterion are equivalent. If so, what are the indications for the more laborious cross-validation in prediction?
-
computing a *single* sample cross-validation is not laborious (there is an explicit formula involving the diagonal of the hat matrix) – user603 May 02 '13 at 08:45
1 Answers
You only ever really need to fit the full model once for cross validation. You can use the results from the full run to work out the residuals from predicting a subset. Now suppose you consider a specific group of observations, say $m$, where $n-m\geq p$ where $n$ is the number of samples and $p$ is the number of betas. The standard least squares solution using all the data is $b=(X^TX)^{-1}X^TY$. Now let the m samples removed be in the $m\times p$ matrix $Z$ and the corresponding observed responses be in the $m\times 1$ vector $W$. Now we can write the "out of sample" prediction for $W$ as follows:
$$Zb_{-Z}=Z(X^TX-Z^TZ)^{-1}(X^TY-Z^TW)$$
That is, we subtract the contribution of the m points away from the full dataset. Next we use the blockwise inversion formula setting $X^TX=A,\;Z^T=B,\;Z=C,\;D=I_m$. After some tedious manipulations we get
$$Zb_{-Z}=(I_m-H_Z)^{-1} (Zb- H_ZW)$$
where $H_Z=Z (X^TX)^{-1}Z^T$ finally the "leave m out" residuals are given as $$ W-Zb_{-Z}= (I_m-H_Z)^{-1} (W-Zb)$$ $$= (I_m-H_Z)^{-1} e_{Z}$$ where $e_Z$ is the residuals for the m samples when they are inclued in the model. Taking their sum of squares gives
$$e_{Z}^T (I_m-H_Z)^{-1} (I_m-H_Z)^{-1} e_Z$$ The idea is to the take all the ${n\choose m }$ combinations of $Z$ available in the sample. But this grows something like $O(m^n)$ and is infeasible for all but very small m. For $m=1$ we have the PRESS statistic given as: $$\sum_i\frac{e_i^2}{(1-h_{ii})^2}$$ taking logs and using the approximations $h_{ii}\approx\frac{p}{n}$ and $(1-q)^{-2}\approx 1+2q$ we get $$n\log(\sum_ie_i^2(1+ 2\frac{p}{n})=n\log(\sum_ie_i^2) +n\log(1+ 2\frac{p}{n})\approx n\log(\sum_ie_i^2) +2p=AIC$$

- 22,555
- 4
- 76
- 97
-
@probabilitylogic: Thank you for your answer. A previous discussion (http://stats.stackexchange.com/questions/577/is-there-any-reason-to-prefer-the-aic-or-bic-over-the-other) stated that "AIC is best for prediction as it is asymptotically equivalent to cross-validation." and that "AIC is equivalent to leave-one-out cross-validation." If so, can I use AIC alone (instead of cross-validation) to fight against over-fitting in prediction? – KuJ May 03 '13 at 15:44
-
@GuhJY That discussion is probably referring to [Stone. 1977. *An Asymptotic Equivalence of Choice of Model by Cross-Validation and Akaike's Criterion*](http://www.jstor.org/stable/2984877). – fileunderwater Jan 26 '15 at 09:28