In a test I had to derive the posterior of the multinomial distribution with the conjugate Dirichlet prior. I used common relation $$p(\mu|X;\alpha) \propto P(X|\mu) P(\mu|\alpha).$$ I did, however, assume $X$ is a single random variable, not a data set. This led me to conclude that the posterior can be written as Dirichlet with parameter $\alpha^*=\alpha+x$, where $\alpha$ and $x$ of dimension $K$ (classes). On the Wikipedia entry for prior giving the posterior parameterizations, all distributions are given for a sample of $n$ data points, hence $\alpha^*=\alpha+ \sum_{i=1}^{n}x$. Is my solution still correct if I want to show that the posterior is a Dirichlet and what's its parameter. More generally I am unsure if posterior distributions are only defined for $n$ data points $X$ (as the Wikipedia entry implies) or can be derived for single $X$ as well.
Asked
Active
Viewed 72 times
2
-
Setting other things aside: if something is defined for $n$ points, what exactly is the problem with $n=1$ ..? Your question is not really clear, but you can apply Bayes theorem to single point, or to multiple points, the same as you could use least squares estimation to find the best parameter given single point (but you won't learn anything revealing...). – Tim Apr 25 '17 at 21:20
-
@Tim It seems that for showing it is a conjugate prior, it is enough to do this with $n=1$? – tomka Apr 25 '17 at 21:33
-
Why shouldn't it? – Tim Apr 25 '17 at 21:53
-
@Tim the reason I am asking is that I have seen many textbook/lecture examples where things are shown for one $X$, and I did so myself in the test, but later I realized that I could have done for $n$. But I think now I understand that what I did is fine. – tomka Apr 26 '17 at 06:18
-
1Check https://stats.stackexchange.com/questions/237037/bayesian-updating-with-new-data/237109#237109 – Tim Apr 26 '17 at 07:19
-
@Tim good point. sequential updating is certainly a reason to do it with one $X$ only – tomka Apr 26 '17 at 08:02
-
1You can do it all-at-once or sequentially, it will be the same. – Tim Apr 26 '17 at 08:08
1 Answers
0
To show that Dirichlet is conjugate prior to multinomial, it is indeed sufficient to use one $X$. For estimation purposes, however, a all-in-one procedure would factor over $n$ independent samples, yielding the Wikipedia entry, or would use a sequential updating step repeatedly with one (randomly selected) sample of all $n$ samples. In the updating the Dirichlet prior hyper-parameter would change from the initial $\alpha$ by adding observations $x_i$ repeatedly. The two procedures are equivalent. The parameters of the posterior thus depend on the purpose.

tomka
- 5,874
- 3
- 30
- 71