0

I have the distribution of a sample statistics which is given by a noncentral t distribution. This distribution is dependent on the population correlation ($\rho$) which is unknown. However I have the predictive distribution of $\rho$ given an observed Pearson correlation $r$. How can I get the distribution of the sample statistics without $\rho$?

Specifically, the population is assumed binormal with means that are $\Delta$ units apart; the variances of the two measures is equal ($\sigma$) but there is correlation ($\rho$). I am interested in $\delta = \Delta / \sigma$.

The distribution of the observed standardized mean difference $d$ ($m_1 - m_2$ divided by the mean standard deviations $(s_1 -s_2)/2$) is given by

$$ \sqrt{\frac{n}{2(1-\rho)}} d \sim t'_v \bigg( \sqrt{\frac{n}{2(1-\rho)}} \delta \bigg) $$

where the degree of freedom $v$ is given by $2(n-1)/(1+\rho^2)$ (see this for the df or this for the non-central t).

To get a distribution based on the observed $r$, say $F(d \;\vert\; \delta, r)$, I intuitively believed that I could "integrate out" the correlation with

$$ \int_{-1}^{+1} f(\rho \;\vert\; r ) F( d \;\vert\; \delta, \rho) \;\mathrm{d} \rho $$

where $f(\rho \;\vert\; r)$ is the predictive density distribution of $\rho$ given an observed $r$ and $F(d \;\vert\; \delta,\rho)$ is the above cumulative distribution function.

However, this does not seem to be the right approach (running extensive simulations, the quantiles estimated from numerical integration are close but not exact). Is my approach sound? is there a difference approach?

Denis Cousineau
  • 189
  • 1
  • 14
  • 1
    Your approach seems to mix a Bayesian quantity ($f(\rho|r)$) with a frequentist calculation. In a purely frequentist approach you will have to calculate something like $F(d,r|\delta,\rho)$ – J. Delaney Feb 20 '22 at 11:15
  • @delaney So you would recommend that I find the joint distribution and then, I find its marginal, is that right? – Denis Cousineau Feb 20 '22 at 12:49
  • It depends on what your end goal is. In a frequentist approach you can't generally eliminate the dependance on the true parameters (you can only run a simulation with some specific $\rho$ and $\delta$), so you can calculate things like $F(d,r|\rho,\delta)$ or $F(d|r,\rho,\delta)$. However for multivariate normal data $r$ is a *sufficient statistic* for $\rho$, meaning that $F(d|r,\rho,\delta)=F(d|r,\delta)$ is independent of $\rho$ – J. Delaney Feb 20 '22 at 14:18
  • @delaney: the fact that $r$ is a sufficient statistic for $\rho$ is a good news. My end goal it to have $F(d \;\vert\; r, \delta)$. – Denis Cousineau Feb 20 '22 at 15:04

0 Answers0