Kendall Tau-b correlation coefficient becomes more significant with many additional ties

Question

Say I have two vectors of length N,

x = [1, 10, 12, ..., 5, 6]
y = [2, 11, 10, ..., 7, 9]

I compute the Kendall tau-b rank order correlation on these two vectors and extract a p-value. If I take the same two vectors, but add additional "null" information to the end of each,

x = [1, 10, 12, ..., 5, 6, 0, 0, 0, ..., 0]
y = [2, 11, 10, ..., 7, 9, 0, 0, 0, ..., 0]

and compute the statistic again, I get a much more significant p-value. Why is this? Because extra zeros at the end will count as ties in both vectors, I don't think they should enter the calculation of tau-b or it's variance from what I've read about the statistic.

A simple example using python is

import numpy
from scipy.stats import kendalltau
x = numpy.random.rand(20).tolist()
y = numpy.random.rand(20).tolist()
z = [0]*20

# prints tau, p-value
print kendalltau(x, y)
# (0.042105263157894736, 0.79520761719370014)
print kendalltau(x+z, y+z)
# (0.69152542372881387, 3.2901769458112632e-10)

I have tested this is several languages (python, r, matlab, mathematica) and I keep getting this behavior. Can someone help me understand why these extra zeros will influence the p-value so significantly?

There exist 3 versions of Kendall tau, http://stats.stackexchange.com/a/18136/3277. Explore yourself please, how they react to ties. — ttnphns, Sep 25 '13 at 17:10
I reference tau-b above and link to it. Most implementations of Kendall tau are of tau-b because it handles ties. However, I edited the title and text slightly to make this clear. — wflynny, Sep 25 '13 at 18:30
You have to consider the concordance you add, not within all those tied values, but between those tied values and the rest of the sample. When you add z to x and y you add 20*20 concordant pairs. — Glen_b, Sep 26 '13 at 00:09

John · Accepted Answer · 2013-09-26T18:13:19.077

Look at the formula for Kendall's $\tau$. It is the number of rank concordant pairs - the number of discordant pairs divided by the sum of them.

While ties don't count as concordant or discordant themselves they cause you to count more concordant and discordant pairs. If all of the ties are low values then it will increase concordance. A hand calculated example will help. Let's say we start with 4 random values in each of group X and Y. We'll sort both groups by group X and then count the pairs. Once you've sorted the data by one column counting concordance is easy. Start at the top of the unsorted column and for each value add up all the values below that are greater as concordant and all of the values lesser as discordant. Ties won't be counted in. (columns C and D below are concordant and discordant respectively)

That has 4 concordant and 2 discordant, which is a tau of 0.33 ((4-2) / (4+2)). Now, let's expand each list by adding three 0s to the front of each and put down the ranks. The 3 0s would be tied and the mean rank would be 2 with the next value, 1, starting at rank 4.

Now I have 16 concordant pairs and 2 discordant (14/18 or 0.78). I haven't counted ties as concordant or discordant but each new value, even though it adds ties, also adds to the concordant and/or discordant count for the non-tied values. In the case of adding in all of the ties at a low rank it has the effect of always increasing concordance.

Ties do not generate concordant pairs. From wikipedia, "If xi = xj or yi = yj, the pair is neither concordant nor discordant." — wflynny, Sep 25 '13 at 18:27
Additionally, for tau-b, the denominator has some adjustment for tied quantities. From playing around with the equation, I can't reason why the p-value jumps from 0.8 to 3*10^-10. — wflynny, Sep 25 '13 at 18:39
Ah, I think that is what I was missing. Although the denominator is of the form `N(N-1)/2`, I don't think I realized that all the pair combinations of a tied rank with other non-tied ranks added to the concordant count. Thanks for this. — wflynny, Sep 26 '13 at 15:03

Kendall Tau-b correlation coefficient becomes more significant with many additional ties

1 Answers1