2

I have one two data sets of scalar values: one large data set (about 700 data points) and one small data set (80 data points). I would like to update the large data set with the small one using the Bayes’ theorem, and so create another large data set (posterior).

The large data set serves as prior, and it is assumed to be normally distributed and so the posterior. This was motivated by the existence of the closed-form expression of the posterior distribution parameters https://en.wikipedia.org/wiki/Conjugate_prior (the first row in the table for Continuous distributions) for the conjugate prior.

However, if I substitute into the closed-form expressions for posterior mean and variance, using the mean and variance values of prior (inferred from the large data set) and local data (inferred from the small data), the resulting posterior distribution does not make sense.

Do I misunderstand that I can simply substitute into these closed-form expressions the known values in order to get the posterior distribution?

Tim
  • 108,699
  • 20
  • 212
  • 390
Vasek
  • 133
  • 2
  • 8
  • Could you give an example of what you are actually doing..? What do you substitute? – Tim Sep 01 '16 at 10:11
  • From the large set I observed mean=20.1, std=7.9 and from the small set: mean=20.6, std=9.5. I've simply taken these observed values, considering the large set as my prior, and substitute them into the formulas for the posterior's mean and std. – Vasek Sep 01 '16 at 10:20
  • What exactly and how do you substitute? And why "doesn't it make sens"? – Tim Sep 01 '16 at 10:29
  • Using the same notations as in https://en.wikipedia.org/wiki/Conjugate_prior, I substitute **mu0=20.1**, **sigma0=7.9** and **mu=20.6**, **sigma=9.5**, **n=70** (n is the size of the small sample), **sum(xi)= 1448.7** into the formulations for the posterior parameters (the first row in the table Continuous distributions) and receive the posterior parameters **mu=20.7** and **sigma=1.2**. Now, **mu** seems to follow some expected value, however, **sigma** is far off comparing to the values from the data sets. – Vasek Sep 01 '16 at 11:19

1 Answers1

4

First of all, the formulas are defined in terms of variance, not standard deviations.

Second, the variance of the posterior is not a variance of your data but variance of estimated parameter $\mu$. As you can see from the description ("Normal with known variance $\sigma^2$"), this is formula for estimating $\mu$ when $\sigma^2$ is known. The prior parameters $\mu_0$ and $\sigma_0^2$ are parameters of distribution of $\mu$, hence the assumed model is

$$ \begin{align} X_i &\sim \mathrm{Normal}(\mu, \sigma^2) \\ \mu &\sim \mathrm{Normal}(\mu_0, \sigma_0^2) \end{align} $$

When both $\mu$ and $\sigma^2$ are unknown and are to be estimated, then you need slightly more complicated model (in Wikipedia table under "$\mu$ and $\sigma^2$ Assuming exchangeability"):

$$ \begin{align} X_i &\sim \mathrm{Normal}(\mu, \sigma^2) \\ \mu &\sim \mathrm{Normal}(\mu_0, \tfrac{\sigma^2}{n+\nu}) \\ \sigma^2 &\sim \mathrm{IG}(\alpha, \beta) \end{align} $$

where first we need to update parameters of inverse gamma distribution to obtain $\sigma^2$:

$$ \begin{align} \alpha' &= \alpha + \frac{n}{2} \\ \beta' &= \beta + \frac{1}{2}\sum_{i=1}^n (x_i -\bar x)^2 + \frac{n\nu(\bar x -\mu_0)^2}{2(n+\nu)} \end{align} $$

and then we can proceed to calculate $\mu$ and MAP point estimate for $\sigma^2$:

$$ \begin{align} \mu &= \frac{ \mu_0\nu + \bar x n }{\nu + n} \\ \operatorname{Mode}(\sigma^2) &= \frac{ \beta' }{ \alpha' + 1 } \end{align} $$

For learning more, refer to "Conjugate Bayesian analysis of the Gaussian distribution" paper by Kevin Murphy, or "The Conjugate Prior for the Normal Distribution" notes by Michael Jordan (notice that there are slight differences between those two sources and that some formulas are given for precision $\tau$ rather then variance) and M. DeGroot Optimal Statistical Decisions, McGraw-Hill, 1970 (pp. 169-171).

Tim
  • 108,699
  • 20
  • 212
  • 390
  • Thank you. This is useful!! My follow-up question is, how should one decide on **σ2** value? In my application where I'm "updating" the large data set with the properties of the small data set, the posterior (resulting) variance is not known. – Vasek Sep 01 '16 at 14:59
  • 1
    @Vasek If it is not known then you should not use this approach... Basically, this is just a simple "textbook" example of usage of conjugate priors and most real-life application would require more complicated models (e.g. estimated using MAP or MCMC). – Tim Sep 01 '16 at 15:02
  • Thanks for explaining. I'll then have to find some "textbook" (meaning something practical) explanation on how to proceed such more complicated approach for the application in interest. It's sort surprising conclusion; I'm basically replicating the methodology of some authors who (in multiple papers) used this conjugate priors closed-form expressions, explicitly mentioning that this way they might overcome the need of employing MCMC samplers. – Vasek Sep 01 '16 at 15:17
  • Well, here we want to "simulate" an outcome (posterior). Apriori, we do not know any of its parameters such that the closed form expressions could be operationalized (as you explained previously). Unless one assumes some asymptotical cases when one of the posterior parameters is exactly equal to the related parameter in the observed data sets (and the remaining parameter could be then computed). So it's not really obvious how they come up with the results without using some samplers. But this is perhaps a question to the authors of methodology that I mentioned earlier. – Vasek Sep 01 '16 at 16:18
  • Let us [continue this discussion in chat](http://chat.stackexchange.com/rooms/44825/discussion-between-vasek-and-tim). – Vasek Sep 02 '16 at 06:59
  • Thanks for the great answer, Tim! Sorry to keep asking, but I have one follow up question. You write: "this is formula for estimating μ when σ2 is known." Is this the posterior σ2 or the historically observed σ2 that we could use as a proxy of the variance of μ? Thanks! – Daniel C Jan 25 '17 at 12:54
  • @DanielChorzelski as stated in the answer, this is formula when the value is known in advance. Second formula shows how to calculate mean and sd from the data. – Tim Jan 25 '17 at 13:29