I've been reading Wagenmakers (2007) A practical solution to the pervasive problem of p values. I'm intrigued by the conversion of BIC values into Bayes factors and probabilities. However, so far I don't have a good grasp of what exactly a unit information prior is. I would be grateful for an explanation with pictures, or the R code to generate pictures, of this particular prior.
2 Answers
The unit information prior is a data-dependent prior, (typically multivariate Normal) with mean at the MLE, and precision equal to the information provided by one observation. See e.g. this tech report, or this paper for full details. The idea of the UIP is to give a prior that 'lets the data speak for itself'; in most cases the addition of prior that tells you as much as one observation centered where the other data are 'pointing' will have little impact on the subsequent analysis. One of its main uses is in showing that use of BIC corresponds, in large samples, to use of Bayes factors, with UIPs on their parameters.
It's probably also worth noting that many statsticians (including Bayesians) are uncomfortable with the use of Bayes Factors and/or BIC, for many applied problems.

- 2,381
- 14
- 11
-
3BIC is not a Bayesian tool, as it removes the impact of the prior. As a Bayesian, I am comfortable with Bayes factors, but not with AIC, BIC, nor DIC! – Xi'an Apr 12 '12 at 05:32
-
1Well, I never said it was! As a Bayesian (who has read and who values Bayesian Choice) I'd be happy with any of those methods if they had some decision-theoretic justification, even approximately, for a utility that reflected what I wanted the analysis to achieve. – guest Apr 12 '12 at 05:51
-
Thanks for the responses. I've asked a follow up question [here](http://stats.stackexchange.com/questions/26339/the-unit-information-prior-and-its-bic-approximation-pt-2) – Matt Albrecht Apr 12 '12 at 06:39
The unit information prior is based on the following interpretation of conjugacy:
Set up
- Normal data: $X^{n}=(X_{1}, \ldots, X_{n})$ with $X_{i} \sim \mathcal{N}( \mu, \sigma^{2})$ with $\mu$ unknown and $\sigma^2$ known. The data can then be sufficiently summarised by the sample mean, which before any datum is seen is distributed according to $\bar{X} \sim \mathcal{N}(\mu, \tfrac{\sigma^{2}}{n} )$.
- Normal prior for $\mu$: With $ \mu \sim \mathcal{N} (a, \sigma^{2})$ with the same variance as in the data.
- Normal posterior for $\mu$: With $ \mu \sim \mathcal{N} (M, v)$ where $ M=\tfrac{1}{n+1}(a + n \bar{x})$ and $v= \tfrac{\sigma^2}{n+1}$.
Interpretation
Hence, after observing the data $\bar{X}=\bar{x}$, we have a posterior for $\mu$ that concentrates on a convex combination of the observation $\bar{x}$ and what was postulated before the data were observed, that is, $a$. Furthermore, the variance of posterior is then given by $\tfrac{\sigma^{2}}{n+1}$, hence, as if we have $n+1$ observations rather than $n$ compared the sampling distribution of the sample mean. Note, that a sampling distribution is not the same as a posterior distribution. Nonetheless, the posterior kind of looks like it, allowing the data speak for themselves. Hence, with the unit information prior one gets a posterior that is mostly concentrated on the data, $\bar{x}$, and shrunk towards the prior information $a$ as a one-off penalty.
Kass and Wasserman, furthermore, showed that model selection $M_{0}:\mu=a$ versus $M_{1}: \mu \in \mathbf{R}$ with the prior given above can be well approximated with the Schwartz criterion (basically, BIC/2) when $n$ is large.
Some remarks:
- The fact BIC approximates a Bayes factor based on a unit information prior, does not imply that we should use a unit information prior to construct a Bayes factor. Jeffreys's (1961) default choice is to use a Cauchy prior on the effect size instead, see also Ly et al. (in press) for an explanation on Jeffreys's choice.
- Kass and Wasserman showed that the BIC divided by a constant (that relates the Cauchy to a normal distribution) can still be used as an approximation of the Bayes factor (this time based on a Cauchy prior instead of a normal one).
References
- Jeffreys, H. (1961). Theory of Probability. Oxford University Press, Oxford, UK, 3 edition.
- Kass, R. E. and Wasserman, L. (1995). "A Reference Bayesian Test for Nested Hypotheses and Its Relationship to the Schwarz Criterion," Journal of the American Statistical Association, 90, 928-934
- Ly, A., Verhagen, A. J., & Wagenmakers, E.-J. (in press). Harold Jeffreys’s default Bayes factor hypothesis tests: Explanation, extension, and application in psychology. Journal of Mathematical Psychology.

- 21
- 3