21

I´m reading the chapter 13 "Adventures in Covariance" in the (superb) book Statistical Rethinking by Richard McElreath where he presents the following hierarchical model:

Model

(R is a correlation matrix)

The author explains that LKJcorr is a weakly informative prior that works as a regularizing prior for correlation matrix. But why it is so? What characteristics the LKJcorr distribution has that make it such a good prior for correlation matrices? Which are other good priors used in practice for correlation matrices?

xboard
  • 1,008
  • 11
  • 17

1 Answers1

13

The LKJ distribution is an extension of the work of H. Joe (1). Joe proposed a procedure to generate correlation matrices uniformly over the space of all positive definite correlation matrices. The contribution of (2) is that it extends Joe's work to show that there is a more efficient manner of generating such samples.

The parameterization commonly used in software such as Stan allows you to control how closely the sampled matrices resemble the identity matrices. This means you can move smoothly from sampling matrices that are all very nearly $I$ to matrices which are more-or-less uniform over PD matrices.

An alternative manner of sampling from correlation matrices, called the "onion" method, is found in (3). (No relation to the satirical news magazine -- probably.)

Another alternative is to sample from Wishart distributions, which are positive semi-definite, and then divide out the variances to leave a correlation matrix. Some downsides to the Wishart/Inverse Wishart procedure are discussed in Downsides of inverse Wishart prior in hierarchical models

(1) H. Joe. "Generating random correlation matrices based on partial correlations." Journal of Multivariate Analysis, 97 (2006), pp. 2177-2189

(2) Daniel Lewandowski, Dorota Kurowicka, Harry Joe. "Generating random correlation matrices based on vines and extended onion method." Journal of Multivariate Analysis, Volume 100, Issue 9, 2009, Pages 1989-2001

(3) S. Ghosh, S.G. Henderson. "Behavior of the norta method for correlated random vector generation as the dimension increases." ACM Transactions on Modeling and Computer Simulation (TOMACS), 13 (3) (2003), pp. 276-294

Sycorax
  • 76,417
  • 20
  • 189
  • 313
  • would you elaborate more on why InvWishart can get singular with high probability? – Albert Chen Oct 26 '20 at 18:16
  • 1
    That's a good catch. I have a recollection of what I was thinking when I wrote this answer -- I have a distinct memory of a plot that shows Wishart-type priors being singular. I'm looking at the resources I was reading at the time and I haven't been able to find that material. I'll keep poking around and if I can't find it, I'll revise the Answer. – Sycorax Oct 26 '20 at 18:41
  • It's possible that I was thinking specifically of "non-informative" Wishart-family distributions (maybe??), but I've removed the reference because I can't pin down what exactly I meant. – Sycorax Oct 26 '20 at 19:09
  • 1
    The [pymc doc on the LKJ prior](https://docs.pymc.io/notebooks/LKJ.html) links to [this github issue](https://github.com/pymc-devs/pymc3/issues/538#issuecomment-94153586), which links to [this youtube video](https://www.youtube.com/watch?v=xWQpEAyI5s8) by Michael Betancourt for an explanation of why Inverse-Wishart is a bad idea. Disclaimer: I did not watch it. – jhin May 02 '21 at 09:54
  • Another discussion of the downsides of InvWishart can be found in [this answer](https://stats.stackexchange.com/a/198731/131402). – jhin May 02 '21 at 10:02