I am taking a look at : http://pages.cs.wisc.edu/~jerryzhu/cs731/kde.pdf
Where they define the following loss function for kernel density estimates
$$J(h) = \int \hat{f_n}^2(x)dx -2\int\hat{f_n}(x)f(x)dx$$ which comes from expanding the loss $$\int(\hat{f_n}(x)-f(x))^2dx$$ called the integrated square loss. This loss makes intiuitive sense to me because they are asking, how well did our kernel density match the true density.
However, I am unable to follow the next step. They claim we can re-write $J(h)$ as $$\hat{J(h)} = \int\hat{f_n}^2(x)-\frac{2}{n}\sum\hat{f_{-i}}(x_i)$$ meaning we approximate $J(h)$ with a leave-one-out approach (that is what the notation $f_{-i}$ means).
I really don't understand the intuition behind this. Can anyone help clarify?
Thanks!