How can I show if there is a bias in judging?

Question

I'm trying to demonstrate there is a bias in judging of academic talks. There was a competition of talks where there were 147 talks given in 23 sections. In each section, there was a judge that determined the best talk from the group. I have a list of the winners of each section, and believe there was a bias to talks given earlier in each section. I have created a histogram of the winners, which I believe should be flat since it should be pretty random which person wins in each group, but it doesn't appear to be so. I want to learn more about the statistics of how I can prove this or show there may exist a bias in earlier talks. Here is the data:

winners_pos = [2,6,1,1,6,5,1,1,3,1,3,4,5,3,2,3,1,1,2,5,6,7,3]
total_in_group = [7,6,6,8,8,5,4,4,6,5,7,5,6,6,6,6,4,9,7,8,8,8,8]

where winners_pos is the presenter's order in that section and total_in_group is the total number of presentations in that group. I have created the following histogram:

Which is simply the frequency of winners based on their order in presenting for their group. I am not sure if there really is a calculable bias since there is so little data, but I am curious about if this is something that can be proven if it does exist.

Showing that there was a tendency to rate earlier presentations more highly doesn't necessarily imply a bias in judging. For example if order of presentation were partly related to order of submission, that might itself be the source of the difference (if earlier submissions were better on average, for example). — Glen_b, Apr 08 '17 at 04:40
The order of presentations was done by last name alphabetically, so should be somewhat random in order of submission. I'm just curious how one would show statistically if there is a true bias. — TheStrangeQuark, Apr 08 '17 at 12:52
I think it is an interesting hypothesis. Do you think that as time goes on the judges attention span decreases. That could be a possibility if it is true. You could look at a statistical measure of association which could indicate a tendency but not a cause. Although from your histogram 1 is the highest at 7, 3 is at 5 and 2, 4 and 6 are all at 3. There is not a great deal of variation and as you mentioned the sample size is small (only 23 sessions). — Michael R. Chernick, Apr 08 '17 at 13:09
@MichaelChernick Yeah, my thought was that the judges may get bored or their attention span decreases over time. And yeah, I wasn't sure if anything could really be shown since the sample size is so small. The only other information I have about the talks is their exact time they occurred, but I'm not sure if that would be helpful at all. — TheStrangeQuark, Apr 08 '17 at 13:33

score 5 · Accepted Answer · edited Apr 13 '17 at 12:44

When there is no bias in a group of size $k$, each position $1,2,\ldots, k$ has equal chances of winning. The expected position is $\mu(k) =(k+1)/2$ and its variance is $V(k) = (k^2-1)/12$.

When, in addition, the results are independent among the sections, each with sizes $k_i$ ($i=1,2,\ldots, n=23$), the expected sum of the winning positions $X_i$ is $\sum_{i=1}^n \mu(k_i)$ and the variance of the winning positions is $\sum_{i=1}^n V(k_i)$. With this many sections, the distribution of the sum of winning positions $\sum_{i=1}^n X_i$ will be (to an excellent approximation) approximately Normal. This provides a simple test based on

$$Z = \frac{\sum_{i=1}^n (X_i - \mu(k_i))}{\sqrt{\sum_{i=1}^n V(k_i)}},$$

which may be referred to the standard Normal distribution.

In the example, the numerator is $-13$ and the denominator is the square root of $241/3$, whence $Z=-1.45$. The test should be two-tailed (because the alternative hypothesis before examining the data would be that the winners are biased in some direction), yielding a p-value of $15\%$, which is not very small: there is only a hint of bias in these data.

This plot of the data shows winning positions (scaled from $1/(k_i+1)$ to $k_i/(k_i+1)$) against the session sizes $k_i$. (Colors indicate raw winning positions.) The blue curve is a Loess smooth. (It hasn't been weighted by $1/k_i$, as it should be, but for crude exploration that's good enough.) The horizontal black line is the expected winning position when no bias exists.

The figure, as well as the negative value of $Z$, both suggest a slight tendency to favor earlier positions--but the test demonstrates that this is just what one would expect of random variation. It's not significant.

Edit: Checking

I checked this result (and thereby detected an error in the original reply, which quoted an incorrect formula for the variance) using the R software at https://stats.stackexchange.com/a/116913 to compute the exact null distribution of the sum of winning positions. Here is a plot of it (as black dots) on which have been superimposed (a) the Normal approximation used above (in gray), which obviously is excellent, and (b) the sum of the winning positions, shown as a vertical red line. Clearly it's a little removed from the middle of this distribution, but not much: there's a sizable chance of observing values much less than $72$ or greater than $98$ (which is the comparable region in the right tail).

Here is the additional code needed to produce these figures.

#
# Test the data.
#
winners_pos = c(2,6,1,1,6,5,1,1,3,1,3,4,5,3,2,3,1,1,2,5,6,7,3)
total_in_group = c(7,6,6,8,8,5,4,4,6,5,7,5,6,6,6,6,4,9,7,8,8,8,8)
mu <- function(k) (k+1)/2
v <- function(k) (k^2-1)/12
Z <- (sum(winners_pos) - sum(mu(total_in_group))) / sqrt(sum(v(total_in_group)))
p <- 2 * pnorm(Z)
#
# Figure 1: The data.
#
library(ggplot2)
X <- data.frame(Count=c(winners_pos, total_in_group), 
       Status=rep(c("Winner", "Size"), each=length(winners_pos)))
X <- data.frame(Count=total_in_group, Winner=winners_pos)
g <- ggplot(X, aes(x=Count, y=Winner/(Count+1))) + ylim(0, 1)
g <- g + geom_smooth(size=1.5)
g <- g + geom_hline(yintercept=1/2, size=1.5) 
g <- g + ylab("Relative Position of Winner") + xlab("Session Size")
g <- g + scale_fill_gradientn(colors=terrain.colors(10))
g <- g + ggtitle("Data", "Horizontally Jittered to Resolve Overlaps")
g + geom_jitter(aes(fill=Winner), 
                position = position_jitter(width = 0.2, height = 0), 
                size=4, pch=21, alpha=0.75)
#
# Figure 2: The exact null distribution.
#
dice <- lapply(total_in_group, die.uniform)
d <- dice[[1]]
for (i in 2:length(dice)) d <- d + dice[[i]]
# -- A plotting function
plot.die <- function(x, tol=0.001, ...) {
  i <- which.max(cumsum(x$prob) >= tol)
  n <- length(x$prob)
  i.bar <- n+1 - which.max(cumsum(rev(x$prob)) >= tol)
  plot(x$value[i:i.bar], x$prob[i:i.bar], ...)
}
plot(d, pch=19,
     xlab="Sum of Winning Positions", ylab="Chance",
     main="Null Distribution",
     sub=paste("p =", signif(p, 3)))
abline(v = sum(winners_pos), col="Red", lwd=2)
curve(dnorm(x, sum(mu(total_in_group)), sqrt(sum(v(total_in_group)))), 
      add=TRUE, col="#00000040", lwd=2)

How can I show if there is a bias in judging?

1 Answers1

Edit: Checking