What methods can be used to specify priors from data?

Question

Background

I am generally interested in learning appropriate methods of using data to specify priors. A previous question asks how to elicit priors from experts and received some good recommendations. Here, I am interested in learning how to specify a prior using data. I plan to use these priors in a meta-analysis to synthesize additional data that I collect.

update Although John provides a 'correct' answer, in my case, it would require substantial modification of the original model to implement, so I would prefer to find a way to estimate the prior as a discrete step.

Questions

What is the best way to specify such a data informed prior?
If I am working with parameters for a particular species (monkeys), and this species belongs to a group of organisms(primates), and data are available for primates but not for monkeys, would it be appropriate to fit a distribution based on the primate data?

Example cases, first with proposed solution

I have 100 observations from 100 primate species of primate thumb length:
```
set.seed(0)
thumb <- rgamma(100, 4, 0.1)
library(MASS)
fitdistr(thumb, 'gamma')
```
Indeed, when there is no apriori reason to select a particular distribution, the distribution can be chosen by maximum likelihood:
```
for(dist in c('gamma', 'lognormal', 'weibull') {
logLik(fitdistr(thumb, dist))
}
```
I have collected 50 means, standard errors, and sample sizes from 50 different primate species, and 50 independent observations from another 50 species of eye diameter:
```
eye <- data.frame( diameter = rgamma(100, 4, 0.1),
                   se       = c(rlnorm(50, 0.5,1), rep(NA, 50)),
                   n        = c(rep(1:5, 10), rep(1, 50)))      
eye <- signif(eye, 3)
```
How can I incorporate the sample statistics into my calculation of a prior?

score 8 · Accepted Answer · edited Apr 13 '17 at 12:44

8

If you have all this data, I think the best answer is to actually fit a single large model, using Hierarchical Modeling rather than do it in two steps (generating a prior then fitting a model). This is basically the answer I gave to this question. I explain this a little bit more there.

In a hierarchical model you model each of the parameters you are interested in (for example, the location and scale parameters of the thumb lengths for a species) are drawn from a common prior distribution. The hyperparameters of the hierarchical model parameterize the common prior distribution of the parameters for the species', and you estimate the hyperparameters at the same time as the parameters you're interested in. The hyperparameters of course need their own prior distribution, but these can be relatively diffuse.

edited Apr 13 '17 at 12:44

Community

1

answered Dec 16 '10 at 03:31

John Salvatier

4,032
1
18
28

1

nice answer. I can see where you are going with this and I think that it is the 'correct' answer. However, I have chosen not to do this (yet) because of the time it would require since I have developed the model (itself heirarchical) to be very flexible and accept a variety of types of data already, and would need to build this generality in to the extended model as well. So your answer is correct although I would really like to know how to calculate a prior as in a discrete step. – David LeBauer Dec 16 '10 at 18:16
Makes sense. Might want to add a bit about that to description, so others don't think I have answered the question you're asking. – John Salvatier Dec 16 '10 at 23:41
I have updated my question, and have proposed an answer. Having pondered this further, I can see why my approach is not ideal, e.g. I would be loosing information by assuming a distribution rather than using the full posterior mcmc chain; on the other hand, my model is relatively insensitive to the choice of priors so it is difficult for me to justify using the alternative approach. – David LeBauer Dec 17 '10 at 20:11

score 4 · Answer 2 · answered Jan 18 '11 at 15:16

A useful way to incorporate data into a prior distribution is the principle of maximum entropy. You basically provide constraints that the prior distribution is to satisfy (e.g. mean, variance, etc.,etc.) and then choose the distribution which is most "spread out" that satisfies these constraints.

The distribution generally has the form $p(x) \propto exp(...)$

Edwin Jaynes was the originator of this principle, so searching for his work is a good place to start.

See the wiki page (http://en.wikipedia.org/wiki/Principle_of_maximum_entropy) and links therein for a more detailed description.

David LeBauer · Answer 3 · 2014-04-08T21:43:23.227

I propose the following solution to 2), and would appreciate feedback:

Data include mean, $Y$, sample size $n$, and standard error $\sigma$; calculate precision ($\tau=\frac{1}{\sigma\sqrt{n}}$) because it is required for logN parameterization by BUGS
data $Y\sim \text{N}(\beta_0,\tau)$
precision $\tau\sim\text{Gamma}(\frac{n}{2},\frac{n}{2\tau})$
diffuse priors
use $N(\mu=\beta_0, \sigma=\frac{1}{\sqrt{\tau}}$) prior

Here is the code:

library(rjags)
data <- data.frame(Y = c(1.6, 2.5, 1.8, 1.8, 1.7, 2.5), 
                   n = c(4, 4, 4, 3, 4, 3), 
                   se = c(0.2, 0.41, 0.24, 0.27, 0.2, 0.14))
# convert se to precision
data <- transform(data, obs.prec = 1/se)[, colnames(data)!='se'] 
# write a bugs model
sink(file= 'model.bug') #put following in file 'model.bug'
                        #i don't think sink() actually works like this 
model 
{
    for (k in 1:length(Y)) {
        Y[k] ~ dnorm(beta.o, tau.y[k])
        tau.y[k] <- prec.y * n[k]
        u1[k] <- n[k]/2
        u2[k] <- n[k]/(2 * prec.y)
        obs.prec[k] ~ dgamma(u1[k], u2[k])
    }
        beta.o ~ dnorm(3, 0.0001)
    prec.y ~ dgamma(0.001, 0.001)
    sd.y  <- 1/sqrt(prec.y)
}
sink()


model  <- jags.model(file = "model.bug", 
                     data = data, 
                     n.adapt = 500, 
                     n.chains = 4)

mcmc.object <- coda.samples(model = model, 
                            variable.names = c( 'beta.o', 'sd.y'), 
                            n.iter = 10000, 
                            thin = 50)
summary(mcmc.object)

Update

I have revised this approach to compute a posterior predictive distribution. It required some modifications, mostly computing a posterior predictive distribution for an unobserved sample.

Details here:

David S. LeBauer, Dan Wang, Katherine T. Richter, Carl C. Davidson, and Michael C. Dietze 2013. Facilitating feedbacks between field measurements and ecosystem models. Ecological Monographs 83:133–154. http://dx.doi.org/10.1890/12-0137.1 pdf

Examples of this and simpler approaches here: https://github.com/dlebauer/pecan-priors/blob/master/priors_demo.Rmd

is `sd.y` the appropriate estimate of $\sigma$ to use in the prior? — David LeBauer, Dec 17 '10 at 20:23
this still doesn't answer your question about the mixing of summary statistics and independent observations. It is not clear if the model handles independent observations appropriately. — David LeBauer, Dec 17 '10 at 20:25

What methods can be used to specify priors from data?

Background

Questions

Example cases, first with proposed solution

3 Answers3

Update

Linked