1

Suppose we want to compute accuracy for a binary classifier (assuming balanced classes):

Acc = (TP+TN)/N

Where N = TP + TN + FP + FN.

For the case of a pure random guesser where each (actual) positive and negative sample has an equal chance of being correctly or incorrectly classified, we have that E(TP)=N/4=E(TN)=N/4, then it is simple to verify that E(Acc)= E[(TP+TN)/N]=(1/N)(N/4+N/4)=1/2.

So, if E(Acc)=1/2 for a pure (uniform) random guesser, How can we compute the theoretical value for Var[Acc]?

Dabaso
  • 107
  • 5

1 Answers1

1

When you classify $N$ samples and have a 50% chance of being correct each time, your TP+TN is a binomially distributed random variable $X$ with parameters $n=N$ and $p=\frac{1}{2}$. Wikipedia tells us that

$$ \text{Var}(X)=np(1-p)=\frac{N}{4}.$$

So

$$ \text{Var}(\text{Acc}) = \text{Var}\bigg(\frac{X}{N}\bigg)=\frac{1}{N^2}\text{Var}(X) =\frac{1}{4N}.$$

Also, accuracy is not a good evaluation measure.

Stephan Kolassa
  • 95,027
  • 13
  • 197
  • 357
  • Hello Stephan, thanks very much for your reply, it makes sense. Yes, I am very aware that using accuracy has many drawbacks. Nevertheless, I took a look at the link you shared and found comments you wrote there very enlightening. About my question, I was just wondering how we could characterize statistically the classical ACC in case of a pure (random) guesser. So, if I may, could we extrapole the expression you wrote above for n_classes > 2? For example... – Dabaso Apr 03 '21 at 16:01
  • For the uniform random guesser, if p=1/n_classes, by using the binomial formla Var(X)=np(1-p), then Var(X)=N*(1/n_clases)(1 - (1/n_classses)). By using this expression for Var(X) for Var(Acc), we would then arrive at Var(Acc) = (1/N)*(1/(n_clases))*(1 - (1/n_classses)), indicating that Var(Acc) decreases as n_classes increases by a factor given by (1/(n_clases))*(1 - (1/n_classses)) ? – Dabaso Apr 03 '21 at 16:06
  • 1
    Yes, that makes sense, although I have to admit that I don't quite know how enlightening the variance of the accuracy of a random classifier on $n$ equiprobable classes is... – Stephan Kolassa Apr 03 '21 at 18:47
  • Hello Stephan, thanks again for your reply. I agree that using the accuracy is not the best choice of classification metric. But in many projects (specially in the industry, as in my case), the metrics are decided in a "top-down" manner, so using accuracy is a requirement defined in the contract, for eg. It's common too that we need to define a baseline reference of bad performance, in this case would be a random uniform guesser. Thus, given the sample size N, I need to compute the theoretical fluctuation of ACC_baseline around the mean value as reference for project documentation... – Dabaso Apr 03 '21 at 18:59
  • 1
    Thanks, that also makes sense. I have had my experience with imposed evaluation metrics, and I agree (and often argue) that comparing a method's performance to an extremely simple benchmark should always be done - in my case of forecasting, one should always check whether one improved on a random walk or historical mean forecast, which can be surprisingly hard. – Stephan Kolassa Apr 03 '21 at 19:04