8

I am relatively new to Bayesian statistics so please be gentle.

I have just performed Approximate Bayesian Computation (ABC) for the inference of a multi-parameter model. Now I am looking to perform a posterior predictive check on the parameters that have been inferred.

What I am wanting to know is that, when sampling from the posterior to generate the summary statistics for the posterior predictive check, do I sample independently from the marginal posteriors for each parameters, or am I supposed to sample the parameter values jointly (i.e. sample from the exact parameter combinations that gave rise to the accepted summary statistics).

The model contains a lot of parameters (over 6) and I am interested in the marginal posteriors for each parameter. I hope this question makes sense.

David
  • 83
  • 3

1 Answers1

8

Great question for a newcomer!!!

Your ABC algorithm provides you with a sample $\theta_1,\ldots,\theta_M$ from the ABC-posterior distribution. For each component of the vector $\theta$, you thus get a sample of size $M$ from the marginal ABC-posterior. For instance here is a toy example about the mean-variance normal posterior, when using median and mad as summaries:

#normal data with 100 observations 
x=rnorm(100)
#observed summaries
sumx=c(median(x),mad(x))

#normal x gamma prior
priori=function(N){
  return(cbind(rnorm(N,sd=10),1/sqrt(rgamma(N,shape=2,scale=5))))
  }

ABC=function(N){

  prior=priori(N)  #reference table

  #pseudo-data
  summ=matrix(0,N,2)
  for (i in 1:N){
    xi=rnorm(100)*prior[i,2]+prior[i,1]
    summ[i,]=c(median(xi),mad(xi)) #summaries
    }

  #normalisation factor for the distance
  mads=c(mad(summ[,1]),mad(summ[,2]))

  #distance
  dist=(abs(sumx[1]-summ[,1])/mads[1])+(abs(sumx[2]-summ[,2])/mads[2])

  #selection
  posterior=prior[dist<quantile(dist,.05),]

  return(posterior)
  }

If you plot

res=ABC(10^5);hist(res[,1])

you will get the marginal ABC-posterior for the normal mean.

However, if you want to do a posterior predictive check, you cannot generate one component of your posterior at a time to get pseudo-data and the corresponding summaries. You need both mean and variance to get a new normal sample! So my R code would then be

postsample=res[sample(1:length(res[,1]),10^3),]

to draw a sample from the ABC-posterior and the pseudo-data would then be generated as previously:

  #pseudo-data
  summ=matrix(0,M,2)
  for (i in 1:M){
    xi=rnorm(100)*postsample[i,2]+postsample[i,1]
    summ[i,]=c(median(xi),mad(xi)) #summaries
    }
Xi'an
  • 90,397
  • 9
  • 157
  • 575
  • 1
    Thank you very much for the thorough answer. Your example R script really made it clear to me. After I posted that question I thought more carefully about what I was asking and I was inching towards the conclusion that you gave, so it is great to have you confirm it for me :-) – David Nov 18 '14 at 22:04
  • 1
    @ Xi'an: Done. Thank you. I'm still new to this site! – David Nov 29 '14 at 15:49
  • (Also brand new to proba and ABC, I may be totally out of scope) From @Xi'an's answer, it is not really clear to me what $M$ is. I guess that it should be the number of posterior check you want to run, right? and if I am right it's then unrelated to the $M$ in $theta_M$ define in the first part, which is the number of particle selected by the ABC right? This brings me to another question: in your answer @ Xi'an you sample $10^3$ particle from the posteriors. Running your code the ABC returns me $5\times10^3$ particles. Is there some rule to choose how many posteriors check one should do? – Simon C. Jan 17 '19 at 14:18
  • Hi Robert, if you could answer this it would be greatly appreciated: https://stats.stackexchange.com/questions/468189/how-do-i-perform-an-actual-posterior-predictive-check-in-this-model –  May 24 '20 at 01:23