2

In the context of using a Normal-Gamma conjugate prior, scale $\eta_0$ can be interpreted as "virtual observations" which is a multiple of variance. Why then is the higher the "virtual observations", the higher the variance of $\mu$? I would have thought more "observations" will lead to lower variance? Is "virtual observation" a misnomer?

See page 13 of this doc.

enter image description here

stevew
  • 749
  • 3
  • 12

1 Answers1

3

Notice what they say

To simplify the algebra we will work with precisions instead of variances. ... in summary, by starting with a normal-gamma prior, we obtain a normal-gamma posterior; i.e., we have found a conjugate prior for the mean and precision of the Gaussian.

They parametrize the Gaussian by mean and precision $\tau$, where the variance would be inverse of the precision. Gathering more data leads to more precision. Higher precision = lower variance, so everything is as expected.

The "virtual observations" are most easily explained using beta-binomial, or Dirichlet-categorical models, where the prior parameters can be thought as counts of "successes" observed a priori. With other models, this intuition may be harder to gasp.

Tim
  • 108,699
  • 20
  • 212
  • 390
  • Thanks, I've misread. I've got confused with the precision parameterisation. – stevew Dec 28 '21 at 22:56
  • 1
    @stevew it's uncommon parametrization, but you can see it sometimes in Bayesian literature because, as they say, using gamma prior for precision gives a little bit more simpler algebra that using inverse-gamma for variance. – Tim Dec 29 '21 at 09:24