Why is the Dirichlet Process unsuitable for applications in Bayesian nonparametrics?

Question

The discrete nature of the DP makes it unsuitable for general applications in Bayesian nonparametrics, but it is well suited for the problem of placing priors on mixture components in mixture modeling.

This quote is from Hierarchical Dirichlet Processes (Teh, et al, (2006)$^{[1]}$) and I was looking for an explanation about what it means. Bayesian nonparametrics seems to be too vague a term for me to understand what the author is referring to.

${[1]}$ Teh, Y. W., Jordan, M. I., Beal, M. J., Blei, D. M. (2006): "Hierarchical Dirichlet Processes". Journal of the American Statistical Association, 101, pp. 1566–1581.

I believe the 'discrete' description refers to the fact that draws from a Dirichlet process are discrete with probability one (it follows from the stick breaking representation of the DP). — ankit, Oct 20 '13 at 00:20
You're going to have to elaborate. If I break a stick into $k$ pieces in some fashion, the distributions of the stick lengths are continuous. — Glen_b, Oct 20 '13 at 00:22
@Glen_b: Your intuition matches mine, but the paper ankit linked to says "that draws from a DP are discrete (with probability one)". I can't follow their argument, but I respect the authors. — David J. Harris, Oct 20 '13 at 03:02
@DavidJ.Harris yes, reading up about it, it seems - inconsistently with the way the word 'process' is more usually associated with distributions - to be referring to what I'd have called something like a 'multinomial process' or 'multinomial mixture', since the output is the category. (This naming scheme would be kind of like referring to inter-event times as a 'Poisson process', rather than the count of the number of events as is normally the case, or perhaps referring to a Bernoulli process as a 'beta process' because there was a beta prior on the Bernoulli probability.) — Glen_b, Oct 20 '13 at 06:09
It depends on whether you think a "countably infinite" number of real numbers is representative of the real numbers. I would have thought that it is, thus providing an argument against the above claim. — probabilityislogic, Oct 20 '13 at 09:04

Zen · Accepted Answer · 2013-10-20T04:59:27.027

4

With probability one, the realizations of a Dirichlet Process are discrete probability measures. A rigorous proof can be found in

Blackwell, D. (1973). "Discreteness of Ferguson Selections", The Annals of Statistics, 1(2): 356–358.

The stick breaking representation of the Dirichlet Process makes this property transparent.

Draw independent $B_i\sim\mathrm{Beta}(1,c)$, for $i\geq 1$.
Define $P_1=B_1$ and $P_i=B_i \prod_{j=1}^{i-1}(1-B_j)$, for $i>1$.
Draw independent $Y_i\sim F$, for $i\geq 1$.
Sethuraman proved that the discrete distribution function $$ G(t,\omega)=\sum_{i=1}^\infty P_i(\omega) I_{[Y_i(\omega),\infty)}(t) $$ is a realization of a Dirichlet Process with concentration parameter $c$ and centered at the distribution function $F$.

The expectation of this Dirichlet Processs is simply $F$, and this may be the distribution function of a continuous random variable. But, if random variables $X_1,\dots,X_n$ form a random sample from this Dirichlet Process, the posterior expectation is a probability measure that puts positive mass on each sample point.

Regarding the original question, you can see that the plain Dirichlet Process may be unsuitable to model some problems of Bayesian nonparametrics, like the problem of Bayesian density estimation, but suitable extensions of the Dirichlet Process are available to handle these cases.

edited Oct 20 '13 at 04:59

answered Oct 20 '13 at 00:31

Zen

21,786
3
72
114

Why is it bad to estimate a density by a discrete distribution? Does this mean quadrature is also bad and inappropriate? – probabilityislogic Oct 20 '13 at 09:05
I didn't say it is "bad". But suppose that you have good prior information about the smoothness of the random density. You can't use this prior information if you are modelling with the plain DP. That's the kind of thing that I have in mind. – Zen Oct 20 '13 at 14:14
I would disagree - smoothness can be controlled by the choice of the concentration parameter, and by the shape of the base distribution. – probabilityislogic Oct 22 '13 at 03:11
If you're modelling with the original DP, using any base measure, the posterior distribution never has a density with respect to Lebesgue meausure. – Zen Oct 22 '13 at 03:34
You are confusing having a density with being smooth - a discrete distribution doesn't have a density either, but that doesn't mean its not smooth - for example a binomial(n,p) with n large is basically as smooth as a normal pdf – probabilityislogic Oct 22 '13 at 06:45
No, I'm not confusing anything. Real samples are *finite*. So, if you believe that a binomial with, say, $n=25$ is "smooth", good luck and good bye. – Zen Oct 22 '13 at 18:15

Why is the Dirichlet Process unsuitable for applications in Bayesian nonparametrics?

1 Answers1