Intuition behind kernel density estimation in Bishop 2006

Question

I have been wondering about this passage from Bishop 2.5.1 Kernel density estimators (p. 122). The passage follows an explanation of how a density function $p$ could be approximated, based on the following argument:

Let random variable $X \sim p$. Consider a small region $R$ such that $x \in R$ for some $x \in X$. Let $P = P(X \in R) = \int_R p(x) \:\mathrm{d}x$.
Assume we have a data set with $N$ samples drawn from $p$. Then $K$, the number of samples that land inside $R$, is binomially distributed $K \sim Bin(N, P)$.
If $N$ is large, the variance of r.v. $K/N$ is close to zero, and so we expect $K/N$, the ML estimator for $P$, to give a good estimate for $P$.
Assume that $R$ is so small that $p$ is constant over $R$. Then $P \approx p(x)V$, where $V$ is the volume of $R$.
Thus, the approximation $p(x) = \frac{K}{NV}$ (2.246)

Note that the validity of (2.246) depends on two contradictory assumptions, namely that the region R be sufficiently small that the density is approximately constant over the region and yet sufficiently large (in relation to the value of that density) that the number K of points falling inside the region is sufficient for the binomial distribution to be sharply peaked.

Bishop, Christopher M. Pattern recognition and machine learning. Springer, 2006.

I am not sure I understand the second assumption.

Does this help https://stats.stackexchange.com/questions/244012/can-you-explain-parzen-window-kernel-density-estimation-in-laymans-terms/244023#244023 ? — Tim, Jun 05 '17 at 18:53

score 1 · Answer 1 · answered Jun 05 '17 at 19:19

The canonical example for the binomial distribution is the coin toss. We can reduce the second point to the case of tossing a coin N times and getting K faces.

Getting a face after a coin toss is equivalent to sampling a data point, and that data point belonging to the region R (one implicit assumption in your question is that those R's are disjoint, and that $\cup_{i} R_{i}$ covers the sample space).

In this answer Is KNN a discriminative learning algorithm? (shameless self-citation, sorry!), this idea is discussed in a bit more detail.

Intuition behind kernel density estimation in Bishop 2006

1 Answers1