I'm confused by the following, and I haven't been able to dig up the answer elsewhere.
I'm trying to learn R while doing some statistics, and, as an exercise, I try to double-check the results of the built-in R functions by also doing these 'by hand', as it were, in R. However, for the Kruskal-Wallis test I keep getting different results, and I can't figure out why.
For example, I'm looking at the following data handed out in an exercise
activity <- c(2, 4, 3, 2, 3, 3, 4, 0, 4, 3, 4, 0, 0, 1, 3, 1, 2, 0, 3, 1, 0, 3, 4, 0, 1, 2, 2, 2, 3, 2)
group <- c(rep("A", 11), rep("B", 10), rep("C", 9))
group <- factor(group)
data.raw <- data.frame(activity, group)
And I want to analyse activity by group. First I run a Kruskal-Wallis test using the in-built R function
kruskal.test(activity ~ group, data = data.raw)
Which returns $H = 8.9056$.
To double-check, I try doing the same 'by hand' in R, with the following (no doubt helpless) code
rank <- rank(activity)
data.rank <- data.frame(rank, group)
rank.sum <- aggregate(rank ~ group, data = data.rank, sum)
x <- rank.sum[1,2]^2 / 11 + rank.sum[2,2]^2 / 10 + rank.sum[3,2]^2 / 9
H <- (12 / (length(activity) * (length(activity) + 1))) * x - 3 * (length(activity) + 1)
H
Which is meant to reflect the following formula:
$$ H =\frac{12}{N(N+1)}\sum_{i = 1}^g \left(\frac{R^2_i}{n_i} \right) - 3(N + 1)$$
Where $N$ is the total number of observations, $g$ is the number of groups, $n_i$ is the number of observations in the $i$th group, and $R_i$ is the sum of ranks of the $i$th group.
And now I get $H = 8.499$, which, adding to my confusion, is also the answer given for the exercise in question. I've tried this for a couple of different data sets, and I tend to get a slightly higher value for $H$ using the in-built function.
I've tried searching to figure out what I'm doing wrong or failing to understand, but to no avail. Can anyone help me understand why the inbuilt kruskal.test
function returns a different value from the one I get by spelling things out?