How do interpret a vague prior for hierarchical modeling?

Question

I am new to Bayesian analysis and using the following WINBUGS example to understand Bayesian hierarchical modeling:

I have 2 questions:

1) For the fixed effects terms, i.e., the beta0 and beta1 terms, I would like to know why is the values of (0.0, 1.0E-5) is a vague prior, as opposed to (0.0, 10,000) for example.

Does the setting of these vague prior hyperparameters depend on whether or not the covariate data is standardized/normalized (For example, 1.0E-5 is used as the vague prior when the data is normalized between 0~1 and 10,000 would be the vague prior for non-normalized data) ?

2) For the tau.h and tau.c in the model, I am seeing this used a 'fair' prior. What difference/effect would it make if both of these were set to: (0.5,0.0005). I have seen it used here for instance. Should I use the prior with lowest DIC? And is (0.5,0.0005) a 'fair' prior?

any insights are appreciated.

Ben Bolker · Accepted Answer · 2019-02-06T21:44:30.480

A few quick answers:

JAGS parameterizes the Normal distribution in terms of mean and precision (precision=1/variance), so a precision of 1e-5 means a variance of 1e5 or a standard deviation of 316. That this is "vague" or "weak" does depend on the scale of the covariate data. "Weak" essentially means that the standard deviation $\gg$ the scale of the data.
I haven't read Best et al 1999 (as cited in your code), but Gamma(eps,eps) where eps << 1 is a typical weak prior for precisions: it gives a positive distribution with a large coefficient of variance (i.e., "vague") and a mean of 1 (JAGS parameterizes Gamma with shape and rate, so the mean is shape/rate = eps/eps = 1. This is again slightly sensitive to the scaling of the relevant covariate.
You should be aware that the Gamma(eps,eps) (which is used in part because it's a conjugate prior for the precision of a Normal distribution, thus mathematically/computationally convenient) prior has been shown to have some bad properties in cases where the data is not very strong (and thus the prior has an effect); it often gives unrealistically large peak densities near zero, see e.g. Gelman 2006.

Gelman, Andrew. “Prior Distributions for Variance Parameters in Hierarchical Models.” Bayesian Analysis 1, no. 3 (xx xx 2006): 515–33.

These are some useful insights. When you say `std dev ≫ the scale of the data`, would you say a `std dev` of `1.0E-5` is not feasible to use when the covariate data is normalized between 0 and 1 (since the `std dev` very small) ? If so, what is an appropriate value for `std dev` in this case? — user121, Jul 29 '18 at 02:09
Yes. If the covariate data is normalized between 0 and 1, then I'd say a std dev of at least 2-3 (would be needed for a weak prior); if you want it to be effectively flat, then say >10? — Ben Bolker, Jul 29 '18 at 09:53

How do interpret a vague prior for hierarchical modeling?

1 Answers1