I have written a function (in R-language) to perform a Theil-Sen regression with large sample approximation (please see below). It seems to work well for the one-sided case as illustrated in Hollander & Wolfe (2nd Edition Ch. 9)
NP.lm <-function(dat, X, Y, alpha) {
dat <- dat[order(dat[,X]),]
n <- nrow(dat)
combos <- combn(n, 2)
i.s <- combos[1,]
j.s <- combos[2,]
Y.num <- dat[j.s,Y] - dat[i.s,Y]
X.dom <- dat[j.s,X] - dat[i.s,X]
Z.p <- qnorm(alpha/2, lower.tail=F)
N <- (n*(n-1))/2
s <- (Y.num/X.dom)[X.dom != 0]
C.stat <- sum(sign(s))
slope <- median(s, na.rm=TRUE)
intercept <- median((dat[,Y] - slope*dat[,X]), na.rm=TRUE)
C.star <- C.stat / sqrt(n*(n-1)*((2*n)+5)/18)
p.val <- pnorm(-abs(C.star))
out <- data.frame(colnames(dat[X]), colnames(dat[Y]), slope, intercept, p.val)
out
}
Example usage
set.seed(123) # to make the example fully reproducible
sigma <- as.matrix(data.frame(c(1, .7), c(.7, 1)))
N <- 50
dat <- data.frame( matrix(rnorm(N*nrow(sigma)), N, nrow(sigma)) %*%
chol(sigma) ) # cor structure via Choleski Decomp.
colnames(dat) <- c("X", "Y")
summary(lm(Y~X, data=dat))
NP.lm(dat=dat, X=1, Y=2, alpha=0.05)
colnames.dat.X.. colnames.dat.Y.. slope intercept p.val
1 X Y 0.6719583 0.08337517 1.227402e-07
# This p-value can't be right
Could someone please tell me what I am doing wrong, and how to make the test 2-sided.
Thank you so much for your help!