Hypothesis Testing: Is a Tool Better Than Random?

Question

I am testing a tool that tries to select the correct outcome. I am trying to do significance testing to see if the tool is better then choosing the outcome at random.

It picks from 4 categories, and I have a list of the correct category, and the one the tool picked.

What test should I use?

Glen_b · Accepted Answer · 2018-12-31T03:43:54.123

Consider that you're interested in whether the proportion of times correct is greater than the proportion you'd reasonably get under random guessing (presumably with equal probability on each outcome).

With 4 outcomes therefore, the chance you get it by random guessing would be $\frac14$. Assuming independence of trials, the number of correct guesses under the null hypothesis would be $\text{binomial}(n, \frac14)$, where $n$ is the number of trials (attempts at guessing); this leads to a binomial test (see the example at the link involving testing whether a die rolls too many 6's, at heart the same problem as yours with a different number of outcomes).

If $n$ is large you could use a normal approximation (leading to the typical one-sample proportions test covered in a non-mathematical introductory stats text), but more generally you can base a test directly off the binomial.

Presumably from the way the question was phrased, you seek a one-tailed test.

[Alternatively, in place of the normal approximation to the binomial, you could perform a chi-squared goodness of fit test (two outcomes, with probabilities 1/4 and 3/4 under the null), but this would prevent doing a one-tailed test.]

It depends on what exactly "random guessing" means. Consider the case where the 4 categories are imbalanced, & you want to include that information. I might see if the OP can form a confusion matrix (cf., [How to calculate information included in R's confusion matrix](https://stats.stackexchange.com/a/253435/7290)). — gung - Reinstate Monica, Dec 31 '18 at 03:52
I agree with the point (indeed my answer hints at this issue in the first sentence). Nevertheless, I expect (as stated in my answer) that the OP intends the random choice of outcomes to be with equal probability. — Glen_b, Dec 31 '18 at 03:54

Hypothesis Testing: Is a Tool Better Than Random?

1 Answers1