In deriving the parameter of a posterior, is it necessary to use the likelihood over $n$ samples?

Question

In a test I had to derive the posterior of the multinomial distribution with the conjugate Dirichlet prior. I used common relation $$p(\mu|X;\alpha) \propto P(X|\mu) P(\mu|\alpha).$$ I did, however, assume $X$ is a single random variable, not a data set. This led me to conclude that the posterior can be written as Dirichlet with parameter $\alpha^*=\alpha+x$, where $\alpha$ and $x$ of dimension $K$ (classes). On the Wikipedia entry for prior giving the posterior parameterizations, all distributions are given for a sample of $n$ data points, hence $\alpha^*=\alpha+ \sum_{i=1}^{n}x$. Is my solution still correct if I want to show that the posterior is a Dirichlet and what's its parameter. More generally I am unsure if posterior distributions are only defined for $n$ data points $X$ (as the Wikipedia entry implies) or can be derived for single $X$ as well.

Setting other things aside: if something is defined for $n$ points, what exactly is the problem with $n=1$ ..? Your question is not really clear, but you can apply Bayes theorem to single point, or to multiple points, the same as you could use least squares estimation to find the best parameter given single point (but you won't learn anything revealing...). — Tim, Apr 25 '17 at 21:20
@Tim It seems that for showing it is a conjugate prior, it is enough to do this with $n=1$? — tomka, Apr 25 '17 at 21:33
@Tim the reason I am asking is that I have seen many textbook/lecture examples where things are shown for one $X$, and I did so myself in the test, but later I realized that I could have done for $n$. But I think now I understand that what I did is fine. — tomka, Apr 26 '17 at 06:18
Check https://stats.stackexchange.com/questions/237037/bayesian-updating-with-new-data/237109#237109 — Tim, Apr 26 '17 at 07:19
@Tim good point. sequential updating is certainly a reason to do it with one $X$ only — tomka, Apr 26 '17 at 08:02
You can do it all-at-once or sequentially, it will be the same. — Tim, Apr 26 '17 at 08:08

score 0 · Accepted Answer · answered Apr 26 '17 at 12:00

To show that Dirichlet is conjugate prior to multinomial, it is indeed sufficient to use one $X$. For estimation purposes, however, a all-in-one procedure would factor over $n$ independent samples, yielding the Wikipedia entry, or would use a sequential updating step repeatedly with one (randomly selected) sample of all $n$ samples. In the updating the Dirichlet prior hyper-parameter would change from the initial $\alpha$ by adding observations $x_i$ repeatedly. The two procedures are equivalent. The parameters of the posterior thus depend on the purpose.

In deriving the parameter of a posterior, is it necessary to use the likelihood over $n$ samples?

1 Answers1