Good non-informative priors for estimating the parameters of a Gaussian with MCMC (using PyMC)?

Question

Say I want to estimate the mean $\mu \in [0, 10] $ of some Gaussian data $\mathbf{x}$ with known variance $\sigma^2 = 1$ using MCMC. Usually I'd use a prior like $\mu \sim \mathrm{Uniform}(0, 10)$ and end up with samples $\hat{\mu}$ that are distributed like this (where $\mu = 5$). This is pretty much what I'd expect, and works fine.

Since the samples $\hat{\mu}$ are also Gaussian though, it seems like I should be able to precisely estimate their mean and variance by sampling them directly using priors like these:

$\mu \sim \mathrm{Norm}(\mu_0, \sigma^2_0)$
$\sigma^2_0 \sim \mathrm{Uniform}(0, 2)$
$\mu^2_0 \sim \mathrm{Uniform}(0, 10)$

I hoped to get samples of $\mu_0$ and $\sigma^2_0$ that were very concentrated around the mean ($\approx 5$) and variance ($\approx 0.02$) of $\hat{\mu}$, but instead neither was more precise. $\mu_0$ is spread out a lot and $\sigma^2_0$ is pretty much uniform.

Is there a better set of priors/hyperpriors that would be more suitable to sampling in cases like this? I suspect that using uniform, independent hyperpriors might be the issue, but I'm not sure.

Here's my PyMC code BTW (possibly there's a bug):

from __future__ import division
import pymc as mc
from pylab import *

# create test data
N = 50
mu = 5
data = randn(N) + mu

# start PyMC variables
mu_0 = mc.Uniform('$\mu_0$', 0, 10)
sigma_0 = mc.Uniform('$\sigma_0^2$', 0, 2)

mu = mc.Normal('$\mu$', mu_0, 1/sigma_0)
data = mc.Normal('data', mu, 1, observed=True, value=data)

# sample
mcmc = mc.MCMC([data, mu, mu_0, sigma_0]) 
mcmc.sample(iter=50000, burn=5000)

# plot
figure()

for i, v in enumerate(('$\mu$', '$\mu_0$', '$\sigma_0^2$')):
    x = mcmc.trace(v)[:].reshape(-1)

    subplot(1, 3, i+1)
    hist(x, 50)
    title(v)

show()

fonnesbeck · Answer 1 · 2013-10-04T20:01:22.130

Gelman has good advice for setting priors for variance parameters in Bayesian models.

There is too much structure in this model for the data you are trying to fit. In particular, it is not clear why you are modeling mu, rather than just putting a prior on it. The way you have it set up, you are claiming that mu is sampled from another normal with unknown parameters, which is not supported by your data.

Also, the standard deviation (not the variance) should be modeled as a uniform for it to be diffuse.

The following code produces good estimates of the true parameters of the model:

from __future__ import division
import pymc as mc
from pylab import *

# create test data
N = 50
mu = 5
data = randn(N) + mu

# start PyMC variables
mu_0 = mc.Uniform('$\mu_0$', 0, 10)
sigma_0 = mc.Uniform('$\sigma_0$', 0, 2)

data = mc.Normal('data', mu_0, sigma_0**-2, observed=True, value=data)

# sample
mcmc = mc.MCMC([data, mu_0, sigma_0]) 
mcmc.sample(iter=50000, burn=5000)

# plot
figure()

for i, v in enumerate(('$\mu_0$', '$\sigma_0$')):
    x = mcmc.trace(v)[:].reshape(-1)

    subplot(1, 2, i+1)
    hist(x, 50)
    title(v)

enter image description here

The [first plot](http://i.imgur.com/BCsf31j.png) I mentioned was actually generated using code similar to yours (except I assume $\sigma$ is known). I'm actually trying to sample the *parameters* of the distribution of $\mu$, since it seems that they should be more precise than sampling $\mu$ directly. Also I tried your suggestion and sampled $\sigma_0$ instead of $\sigma_0^2$, and it didn't seem to make a difference (the distribution was still flat). — roger_, Oct 04 '13 at 20:04

score 2 · Answer 2 · answered Oct 08 '13 at 00:07

Based on your statement that "the samples $\hat \mu$ are also Gaussian", I think what you are saying is that samples from the posterior of $\mu$ appear normally distributed, and this inspires you to think about the bias (mean) and variance of these draws as if they were estimates of $\mu$. Certainly we do this kind of analysis of estimation algorithms in statistical signal processing.

However, I think the mistake in your thinking happens when you treat individual draws from the posterior, what you call $\hat \mu$, as actual estimates of $\mu$, when in fact these draws, as a collective, define the posterior distribution itself (as much as draws from a distribution can be said to "define" it).

Here's how I think you'd get 5 for a sample mean and 0.2 as a sample standard deviation (you said variance but I think you mean standard deviation). First, construct an explicit estimator for the mean $\mu$ given data $\mathbb x$, e.g., obtain 50'000 MCMC samples and average them. (Note how this estimator collapses the entire MCMC run into a single scalar estimate.) Then apply this estimator on repeated Monte Carlo draws of $\mathbb x$, i.e., generate a bunch of new data and each time apply this estimator. You should find the sample mean of these Monte Carlo estimates to be close to 5 and their sample deviation to be small.

I would not expect any inferences indicating a mean of 5 and standard deviation of 0.2 by modeling the mean $\mu$ random variable itself as a draw from another parent distribution. As @fonnesbeck explains, the structure of your data can't inform this higher-level hierarchy.

In summary, the draws you see from the posterior of $\hat \mu$ are different than the random variable $\mu$ itself.

I believe with a uniform prior the distribution of $\hat{\mu}$ is at least asymptotically Gaussian (i.e. thinking of it as a MLE). While the samples {$\hat{\mu}$} do form a distribution (sorry, my notation wasn't the best), I don't see anything wrong with treating each one as a single point estimate of $\mu$. I understand what you mean about doing two MCMC runs, but I was hoping that was unnecessary. — roger_, Oct 08 '13 at 02:05
"treating each [sample] as a single point estimate of $\mu$" --- a draw from the posterior of the mean is not a point estimate of the mean. The sample mean of several such draws, however, is a point estimate of the mean. — Ahmed Fasih, Oct 08 '13 at 12:50
This is a bit semantical, but I don't believe that is true. Anything can be treated an estimate, but if you prefer think of it as the mean of a single draw (it's also unbiased and consistent). — roger_, Oct 08 '13 at 16:39

Good non-informative priors for estimating the parameters of a Gaussian with MCMC (using PyMC)?

2 Answers2

Linked