1

As an extension of this interesting question, suppose I draw two samples $\mathbf x_1$, $\mathbf x_2$ independently from a $\textrm{Multinomial}(n,\mathbf p)$, with $\mathbf p$ a probability vector of length $k$.

What is the distribution (expectation alone will suffice) of their Pearson correlation coefficient \begin{align} r = \frac{(k\mathbf x_1 - n) \cdot (k\mathbf x_2 - n)}{|k\mathbf x_1 - n||k\mathbf x_2 - n|}? \end{align}

$r^2$ looks to be Beta from simulations, but I can't make much headway with pen and paper.

Will
  • 1,118
  • 8
  • 16
  • It cannot have a Beta distribution for the simple reason that $r^2$ can take only a finite number of possible values, whence its distribution must be *discrete*. If you wish to approximate it, then the quality of the approximation will depend on the entropy of $p$. Thus, the appearance you get in your simulations will not necessarily reveal a general pattern. – whuber Jan 18 '18 at 16:52
  • Ah, very good point. A large $n$ approximation would be fine, but I'm pretty stuck on where to go with it. – Will Jan 18 '18 at 17:01
  • Depending on $p$, a large-$n$ approximation should be easy to come by: pretend the data are Normal. – whuber Jan 18 '18 at 17:34

0 Answers0