6

In "Cross-Validation Methods. Journal of mathematical psychology, Vol. 44, No. 1. (March 2000), pp. 108-132", Professor Browne pointed out that single sample cross-validation index and the Akaike information criterion are equivalent. If so, what are the indications for the more laborious cross-validation in prediction?

Momo
  • 8,839
  • 3
  • 46
  • 59
KuJ
  • 1,356
  • 3
  • 15
  • 25
  • computing a *single* sample cross-validation is not laborious (there is an explicit formula involving the diagonal of the hat matrix) – user603 May 02 '13 at 08:45

1 Answers1

6

You only ever really need to fit the full model once for cross validation. You can use the results from the full run to work out the residuals from predicting a subset. Now suppose you consider a specific group of observations, say $m$, where $n-m\geq p$ where $n$ is the number of samples and $p$ is the number of betas. The standard least squares solution using all the data is $b=(X^TX)^{-1}X^TY$. Now let the m samples removed be in the $m\times p$ matrix $Z$ and the corresponding observed responses be in the $m\times 1$ vector $W$. Now we can write the "out of sample" prediction for $W$ as follows:

$$Zb_{-Z}=Z(X^TX-Z^TZ)^{-1}(X^TY-Z^TW)$$

That is, we subtract the contribution of the m points away from the full dataset. Next we use the blockwise inversion formula setting $X^TX=A,\;Z^T=B,\;Z=C,\;D=I_m$. After some tedious manipulations we get

$$Zb_{-Z}=(I_m-H_Z)^{-1} (Zb- H_ZW)$$

where $H_Z=Z (X^TX)^{-1}Z^T$ finally the "leave m out" residuals are given as $$ W-Zb_{-Z}= (I_m-H_Z)^{-1} (W-Zb)$$ $$= (I_m-H_Z)^{-1} e_{Z}$$ where $e_Z$ is the residuals for the m samples when they are inclued in the model. Taking their sum of squares gives

$$e_{Z}^T (I_m-H_Z)^{-1} (I_m-H_Z)^{-1} e_Z$$ The idea is to the take all the ${n\choose m }$ combinations of $Z$ available in the sample. But this grows something like $O(m^n)$ and is infeasible for all but very small m. For $m=1$ we have the PRESS statistic given as: $$\sum_i\frac{e_i^2}{(1-h_{ii})^2}$$ taking logs and using the approximations $h_{ii}\approx\frac{p}{n}$ and $(1-q)^{-2}\approx 1+2q$ we get $$n\log(\sum_ie_i^2(1+ 2\frac{p}{n})=n\log(\sum_ie_i^2) +n\log(1+ 2\frac{p}{n})\approx n\log(\sum_ie_i^2) +2p=AIC$$

probabilityislogic
  • 22,555
  • 4
  • 76
  • 97
  • @probabilitylogic: Thank you for your answer. A previous discussion (http://stats.stackexchange.com/questions/577/is-there-any-reason-to-prefer-the-aic-or-bic-over-the-other) stated that "AIC is best for prediction as it is asymptotically equivalent to cross-validation." and that "AIC is equivalent to leave-one-out cross-validation." If so, can I use AIC alone (instead of cross-validation) to fight against over-fitting in prediction? – KuJ May 03 '13 at 15:44
  • @GuhJY That discussion is probably referring to [Stone. 1977. *An Asymptotic Equivalence of Choice of Model by Cross-Validation and Akaike's Criterion*](http://www.jstor.org/stable/2984877). – fileunderwater Jan 26 '15 at 09:28