3

I am unsure about an expression in Stone's (1997) paper on asymptotic equivalence between AIC and LOOCV. Section 4., third line from the bottom of page 45 starts with $L(\theta)-1(y_i|x_i,\theta)$. The second part of this expression is puzzling to me.

What does $1$ stand for? An indicator function?
Or should it actually be $l$ rather than $1$, meaning the likelihood of a single observation $l(y_i|x_i,\theta)$?

References

Richard Hardy
  • 54,375
  • 10
  • 95
  • 219

1 Answers1

1

This is definitely a typo. Note that $\ell$ stands for the log-likelihood, $S=\{(x_i,y_i)\}$ for the training data, $S_{-i}$ for the training data with the $i$-th entry removed (defined right before equation (3.3)), and $$ L(\theta) = \sum_j\ell(y_j|x_j,\theta).$$

Finally, $\hat\theta(S)$ ("$\hat\theta$ for short") is defined as the maximizer of $L(\theta)$.

We are considering "$\hat\theta(S_{-i})$ ($\hat\theta_{-i}$ for short)". Per the definition of $\hat\theta(S)$, this is the maximizer of $L(\theta)$, but based on $S_{-i}$ instead of $S$, or

$$ \sum_{j\neq i}\ell(y_j|x_j,\theta) = L(\theta)-\ell(y_i|x_i,\theta).$$

And Stone (1977) writes $L(\theta)-1(y_i|x_i,\theta)$ instead of the last expression. So the $1$ should be an $\ell$ here.

(Another argument for using $\ell$ instead of $l$. Some things do get better.)

Stephan Kolassa
  • 95,027
  • 13
  • 197
  • 357