Difference between point-biserial and rank-biserial correlations

Question

According to the wikipedia article the point-biserial correlation is just Pearson correlation where one variable is continuous but the other is dichotomous (e.g. Yes/No, Male/Female). However the article later introduces rank-biserial correlation, which is a correlation measure between a dichotomous variable and a ordinal/ranked variable:

$r_{rb}=2(M_1-M_0)/n$

where $M_1$ and $M_0$ are the mean ranks in the continuous/ordinal variable, in groups "1" and "0", respectively, and $n=n_1+n_0$ is the total sample size.

What is the difference? Is rank-biserial correlation related to Pearson correlation?

This Q&A http://stats.stackexchange.com/questions/105542/proof-of-point-biserial-correlation-being-a-special-case-of-pearson-correlation?rq=1 discusses the relationship between point-biserial and Pearson so it seems unlikely that what you suggest holds. Note the close relationship between rank-biserial and Mann Whitney U as stated in the article you cite. — mdewey, Mar 12 '17 at 10:02

ttnphns · Accepted Answer · 2017-03-12T15:52:09.767

The Wikipedia formula of "rank-biserial correlation" that you show was introduced by Glass (1966) and it is not equivalent to usual Pearson $r$ when the latter is computed on ranks data (that is, $r$ which actually will be Spearman's $rho$).

Let define $Y$ to be the quantitative variable already turned into ranks; and $X$ be the dichotomous variable with groups coded 1 and 0 (total sample size $n=n_1+n_0$).

Knowing the formula of Pearson $r$ and observing the following equivalencies of our situation on ranks vs 1-0 dichotomy,

$\sum XY= \sum Y_{x=1}=R_1$ (Sum of ranks in group coded 1),

$\sum X = \sum X^2 = n_1$,

$\sum Y = n(n+1)/2$,

$\sum Y^2 = n(n+1)(2n+1)/6$,

substitute, and get Pearson $r$ (= Spearman $rho$) formula looking as:

$r= \frac{2R_1-n_1(n+1)}{\sqrt{n_1n_0(n^2-1)/3}}$.

Now do substitutions into Glass' "rank-biserial correlation", to obtain:

$r_{rb}= \frac{2R_1-n_1(n+1)}{n_1n_0}$.

You can see that their denominators are different. So, Glass's $r_{rb}$ correlation isn't true Pearson/Spearman correlation. (Point-biserial correlation is true Pearson correlation.)

I haven't read Glass' original paper or its reviews and hesitate to say what can be the reason behind the correlation and is there any advantage of it over the Pearson/Spearman correlation.

Difference between point-biserial and rank-biserial correlations

1 Answers1