13

I'm trying a very simple model: fitting a Normal where I assume I know the precision, and I just want to find the mean. The code below seems to fit the Normal correctly. But after fitting, I want to sample from the model, ie generate new data which is similar to my data variable. I know I can use trace("mean") to get samples for the mean variable. But how can I get new samples from the model itself?

I've looked at docs eg http://pymc-devs.github.io/pymc/database.html#accessing-sampled-data. I've also looked at quite a few examples, eg the mining disasters, and several from the Probabilistic Programming notebooks, and none mention this. I (more or less an MCMC beginner) expected that sampling from the fitted model was the whole point! What am I missing?

from pymc import *
data = np.array([-1, 0, 4, 0, 2, -2, 1, 0, 0, 2, 1, -3, -1, 0, 0, 1, 0, 1])
mean = Uniform("mean", -4, 4)
precision = 2.0**-2
obs = Normal("obs", mean, precision, value=data, observed=True)
model = Model( {"mean": mean, "obs": obs})
mcmc = MCMC(model)
mcmc.sample(10000, 1000, 1)
# I can get samples for the "mean" variable
mean_samples = mcmc.trace("mean")[:]
hist(mean_samples)
# but how can I do the equivalent of mcmc.trace("obs")?
jmmcd
  • 363
  • 3
  • 9

2 Answers2

15

You are looking the what's called the predictive distribution. To include this is very simple. Before creating the Model, add the additional stochastic variable:

predictive = mc.Normal( "predictive", mean, precision )
model = Model( {"mean": mean, "obs": obs, "pred":predictive})

...

predictive_traces = mcmc.trace("predictive")[:]
hist( predictive_traces )

Artificial data from the fitted model

This will generate artifical data from the fitted model. Thanks for bringing this oversight to my attention, I'll include it in the BMH project.

MD004
  • 187
  • 1
  • 1
  • 7
Cam.Davidson.Pilon
  • 11,476
  • 5
  • 47
  • 75
  • How do you create an array of n random variables in which n is random? https://stackoverflow.com/questions/45283843/pymc-what-s-the-best-way-to-create-an-array-with-a-random-number-of-random-var (Sorry is this is too much...) – drake Jul 25 '17 at 19:15
5

Landed here several years later when looking for the same thing using PyMC3, so I am going to leave an answer relevant to the new version: (from Posterior Predictive Checks).

ppc = pm.sample_ppc(trace, samples=500, model=model, size=100)

Now, ppc contains 500 generated data sets (containing 100 samples each), each using a different parameter setting from the posterior.

Jan Kukacka
  • 10,121
  • 1
  • 36
  • 62