Help me understand the Bayesian kernel density estimation (Sibisi and Skilling, 1996)

Question

Sibisi and Skilling (1996, also mentioned in the 1997 paper) define Bayesian kernel density as

$$ f(x) = \int dx' \,\phi(x')\, K(x, x') \tag{2} $$

Here the kernel $K$ is an assigned smooth function, possibly having a few width and shape parameters. $\phi$ is defined solely by integral properties. Thus it is an underlying latent density controlling $f$, observed only indirectly via $f$. It is natural to require $K$ and $\phi$ to be non-negative, and for $K$ to be normalized over $x$. Then $\phi$ is also normalized, so belongs to Lebesgue class 1, and the kernel $K$ endows $f$ with the requisite smoothness.

$\phi$ bears an analogy to the coefficients of a finite mixture model for $f$ (e.g. [6], [11], [13]). However, $\phi$ being an arbitrarily detailed density rather than a set of discrete coefficients, (2) may be interpreted as a nonparametric mixture model where $\phi$ plays the role of a full spectrum of arbitrarily many mixture coefficients.

Later on, they say that if we assume the empirical distribution $\phi = \tfrac{1}{N}$, this simplifies to standard kernel density estimator and that

In our approach, the task of inferring $f$ is delegated to the inferral of $\phi$, so we must assign $\phi$ a prior. But first we need to assign a measure on $\phi$-space. Here, we anticipate the requirements of practical computation by writing (2) in matrix form $f = K \Phi$ with the abscissa partitioned into some potentially large number $M$ of disjoint cells $\{\mathcal{C}_i , i = 1, \dots, M\}$, and with the latent density decomposed into corresponding amounts

$$ \Phi_i = \int_{\mathcal{C}_i} \phi(x) \, dx \tag{4} $$

Next, they say that to compute the posterior, one needs to take integrals over $\Phi$, so we need to define a measure over it and they suggest Dirichlet density measure. In such construction, "smoothness is included through $K$ and not $\phi$". They suggest flat prior for $\Phi$ and recognize this as a Dirichlet process as described by Ferguson (1973, 1974). Since I am not sure if I understand them correctly, I am looking for a reassurance.

As far as I understand, from practical point of view, their approach can be understood as of a kind of finite mixture, where $M$ components described by kernels $K$ centered at the midpoints $x_1',\dots,x'_M$ appear randomly with probabilities $\phi_1,\dots,\phi_M$ that are distributed according to Dirichlet distribution with uniform prior, that gets updated when we observe cases that fall into the cells. If so, the model can be easily estimated using Monte Carlo simulation that draws the $\phi_i$ weights from the Dirichlet distribution and uses them as weights in standard weighted kernel density estimator

$$ f(x) = \sum_{i=1}^M \phi_i \, K(x, x'_i) $$

Is this correct, or am I missing something?

References (unfortunately, both behind a paywall):

Sibisi, S., & Skilling, J. (1996). Bayesian Density Estimation. In Maximum Entropy and Bayesian Methods (pp. 189-198). Springer.

Sibisi, S., & Skilling, J. (1997). Prior distributions on measure space. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 59(1), 217-235.

I can't speak to the specifics of Sibisi and Skilling, but I can say something about Bayesian density estimation where the prior has the form given at the end of your question. — mef, Sep 05 '17 at 15:14
@mef the form in the end of the question is just a general finite mixture model. My question is about relation of (2) with it. The authors provide a pretty complicated (and unclear since multiple understatements) description of something that seems to be a very simple model, so I'm looking for reassurance if I understand them correctly. — Tim, Sep 06 '17 at 22:53

Help me understand the Bayesian kernel density estimation (Sibisi and Skilling, 1996)

0 Answers0