2

I am trying to conduct a meta-analysis of a bounded parameter, say, a correlation parameter $\rho \in [0, 1]$. I have multiple studies $i = 1, \dots, N$, each producing a likelihood $L_i(\rho_i), \rho_i \in [0, 1]$. And each study are with different sample size, therefore different precision of the $\rho_i$ estimates (which is probably already reflected in the flatness of the likelihood curve $L_i (\rho_i)$.

This parameter $\rho_i$ is likely close to 1 based on prior knowledge, therefore the normal asymptotics are not so valid because of the bounded parameter space.

The main interests are:

  1. Assessing the heterogeneity of $\rho_i$ across studies.
  2. Produce a single summary of $\rho$.
  3. Test $H_0: \rho_i = 1, i \in 1, \dots, N$ accounting for different precision of $\rho_i$.

Is there any reference for guiding the analysis of such problems?

And if one is willing to further assume $\rho_i \in \text{Uniform}[0, 1]$, is there some established procedure to perform a Bayesian meta-analysis on the posterior distribution $\pi_i(\rho_i), i=1, \dots, N$?

david
  • 144
  • 8
  • Have you thought about transforming your parameter into an unbounded region (e.g. using inverse sigmoid or similar)? – J. Delaney Feb 06 '22 at 18:46
  • I have a not-quite-finished paper that addresses all the issues you raise. I'm working on an answer that provides an introduction and summary. Do you mind telling me what the subject matter is? – mef Feb 06 '22 at 19:24
  • @J.Delaney This seems plausible by transforming the parameter to unbounded space, and rely on normal assumption to meta-analyze mean + se. But the statistical properties are not so clear so i am searching for a more principled solution. – david Feb 06 '22 at 20:00
  • @mef that sounds very interesting. The subject matter is on meta-analyzing genetic correlation, e.g. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5005434/ essentially each study corresponds to a correlation of genetic effects of a particular trait, and one would like to meta-analyze across traits. I believe surely other fields will have similar problems and i look forward to your answer. – david Feb 06 '22 at 20:05
  • Excellent! Can you describe the functional forms of the likelihoods? They don't need to be restricted to the unit interval; the prior will take care of that. – mef Feb 06 '22 at 21:10
  • I am not sure what you mean specificly on the functional form of the likelihoods. A simplified description is $L(\sigma^2, \rho) = \text{MVN}(\text{data}; 0, \begin{bmatrix} \sigma^2 & \rho \sigma^2 \\ \rho \sigma^2 & \sigma^2 \end{bmatrix})$. And one can optimize jointly over $\sigma^2, \rho$ and describe the likelihood of $\rho$ ($\sigma^2$ as a nuiance parameter). And therefore, we can obtain the likelihood $L(\rho)$ for every $\rho \in [0, 1]$ – david Feb 06 '22 at 21:26
  • That's exactly what I want! (Sorry I didn't see your response until just now.) Mathematica can integrate out $\sigma^2$ (using an inverse gamma prior or an improper prior proportional to 1 or $1/\sigma^2$) producing the likelihood for $\rho$. That's great. But the likelihood has a value of zero at $\rho = 1$ (unless the two variables in the data are exactly collinear, in which case the likelihood is infinite there). This behavior of the likelihood will cause difficulties in evaluating the hypothesis $\rho_i = 1$. Your interests 1 and 2 can still be addressed. – mef Feb 07 '22 at 18:30
  • If these are correlations why not use the hyperbolic arctangent transformation (Fisher's z)? That is what most meta-analysts do. – mdewey Feb 10 '22 at 15:30
  • Are the $\sigma^2$ variables known? Sometimes the standard error is assumed to be known in a meta-analysis. – Eli Feb 10 '22 at 15:46

1 Answers1

1

As I mentioned in the comments, your likelihood makes assessing the probability of $\rho_i = 1$ difficult if not impossible. I present a framework that addresses the other two main interests.

One way to think about the overall problem is as latent-variable density estimation. The summary would then be the "predictive" distribution for $\rho_{N+1}$. (I put "predictive" in quotes since the term is typically applied to observables.)

If you observed $\rho_{1:N} = (\rho_1, \ldots, \rho_N)$ then you could apply Bayesian density estimation directly and compute the predictive distribution: \begin{equation} p(\rho_{N+1}|\rho_{1:N}) = \int p(\rho_{N+1}|\theta)\,p(\theta|\rho_{1:N})\,d\theta , \end{equation} where \begin{equation} p(\theta|\rho_{1:N}) = \frac{p(\rho_{1:N}|\theta)\,p(\theta)}{p(\rho_{1:N})} \end{equation} and \begin{equation} p(\rho_{1:N}|\theta) = \prod_{i = 1}^N p(\rho_i|\theta) . \end{equation} The main choices are the form of $p(\rho_i|\theta)$ and the prior for $\theta$. [I'll get back to this for your case later.]

But instead of observing $\rho_{1:N}$ directly, you have the "indirect observations" \begin{equation} p(Y_{1:N}|\rho_{1:N}) = \prod_{i=1}^N p(Y_i|\rho_i) , \end{equation} where (for notational convenience) I have written \begin{equation} p(Y_i|\rho_i) = L_i(\rho_i) . \end{equation} In this case we can average the the predictive distribution according to the posterior distribution of latent variables: \begin{equation} p(\rho_{N+1}|Y_{1:N}) = \int p(\rho_{N+1}|\rho_{1:N})\,p(\rho_{1:N}|Y_{1:N})\,d\rho_{1:N} , \end{equation} where \begin{equation} p(\rho_{1:N}|Y_{1:N}) = \frac{p(Y_{1:N}|\rho_{1:N})\,p(\rho_{1:N})}{p(Y_{1:N})} . \end{equation}

Although the preceding approach is conceptually straightforward, there is a more direct approach that is simpler to implement: \begin{equation} p(\rho_{N+1}|Y_{1:N}) = \int p(\rho_{N+1}|\theta)\,p(\theta|Y_{1:N})\,d\theta , \end{equation} where \begin{equation} p(\theta|Y_{1:N}) = \frac{p(Y_{1:N}|\theta)\,p(\theta)}{p(Y_{1:N})} \end{equation} and \begin{equation} p(Y_{1:N}|\theta) = \prod_{i=1}^N p(Y_i|\theta) \end{equation} and \begin{equation} p(Y_i|\theta) = \int p(Y_i|\rho_i)\,p(\rho_i|\theta)\,d\rho_i . \end{equation} Along the way you would obtain the posterior distributions for each of the $N$ entities: \begin{equation} p(\rho_i|Y_{1:N}) = \int p(\rho_i|Y_i,\theta)\,p(\theta|Y_{1:N})\,d\theta, \end{equation} where \begin{equation} p(\rho_i|Y_i,\theta) = \frac{p(Y_i|\rho_i)\,p(\rho_i|\theta)}{p(Y_i|\theta)} . \end{equation}

As noted above, the main choices are the form of the prior for the parameter, $p(\rho_i|\theta)$, and the prior for the hyperparameter, $p(\theta)$. Here is one choice that is particularly well-suited to the unit interval. Let \begin{equation} p(\rho_i|\theta) = \sum_{j=1}^J \theta_j\,f_j(\rho_i) \end{equation} where $\theta = (\theta_1,\ldots, \theta_J)$ is a vector of non-negative weights that sum to one and \begin{equation} f_j(\rho_i) = \textsf{Beta}(\rho_i|j,J-j+1) . \end{equation} This prior is a mixture of beta distributions; it is equivalent to a Bernstein polynomial restricted to be a density. The larger $J$ is, the more flexible the prior can be. This prior has the property that if $\theta_j = 1/J$ for all $j$ then $p(\rho_i|\theta) = \textsf{Uniform}(\rho_i|0,1)$.

Finally, let \begin{equation} p(\theta) = \textsf{Dirichlet}(\theta|\lambda) , \end{equation} where $\lambda = (\lambda_1, \ldots, \lambda_J)$. Note $E[\theta] = \lambda/\lambda_0$ where $\lambda_0 = \sum_{j=1}^J \lambda_j$. Also note that \begin{equation} p(\rho_i) = \int p(\rho_i|\theta)\,p(\theta)\,d\theta = \sum_{j=1}^J E[\theta_j]\,f_j(\rho_i) . \end{equation} If (for example) $\lambda_j = 1/J$ for all $j$, then $p(\rho_i) = \textsf{Uniform}(\rho_i|0,1)$.

A Gibbs sampler is straightforward to implement as long as \begin{equation} L_{ij} = \int p(Y_i|\rho_i)\,f_j(\rho_i)\,d\rho_i \end{equation} is easy to compute for all $i$ and all $j$. It may be possible to integrate out nuisance parameters $\phi_i$ at this stage: \begin{equation} L_{ij} = \iint p(Y_i|\rho_i,\phi_i)\,f_j(\rho_i)\,p(\phi_i)\,d\phi_i\,d\rho_i . \end{equation}

mef
  • 2,521
  • 1
  • 15
  • 14