Say I have two vectors of length N,
x = [1, 10, 12, ..., 5, 6]
y = [2, 11, 10, ..., 7, 9]
I compute the Kendall tau-b rank order correlation on these two vectors and extract a p-value. If I take the same two vectors, but add additional "null" information to the end of each,
x = [1, 10, 12, ..., 5, 6, 0, 0, 0, ..., 0]
y = [2, 11, 10, ..., 7, 9, 0, 0, 0, ..., 0]
and compute the statistic again, I get a much more significant p-value. Why is this? Because extra zeros at the end will count as ties in both vectors, I don't think they should enter the calculation of tau-b or it's variance from what I've read about the statistic.
A simple example using python is
import numpy
from scipy.stats import kendalltau
x = numpy.random.rand(20).tolist()
y = numpy.random.rand(20).tolist()
z = [0]*20
# prints tau, p-value
print kendalltau(x, y)
# (0.042105263157894736, 0.79520761719370014)
print kendalltau(x+z, y+z)
# (0.69152542372881387, 3.2901769458112632e-10)
I have tested this is several languages (python, r, matlab, mathematica) and I keep getting this behavior. Can someone help me understand why these extra zeros will influence the p-value so significantly?