4

I have classification tree where the balanced accuracy of the test set is higher than the normal accuracy. I thought balanced accuracy can only have at his maximum the same value as the accuracy not higher. Can anyone explain in which situation the balanced accuracy can be higher then accuracy?

1 Answers1

5

Let

$$ a:=TP,\quad b:= TN,\quad c:=TP+FN,\quad d:=TN+FP. $$

Then accuracy and balanced accuracy are

$$ Acc=\frac{a+b}{c+d},\quad BAcc=\frac{a}{2c}+\frac{b}{2d}, $$

or

$$ Acc=\frac{acd+bcd}{cd(c+d)},\quad BAcc=\frac{\frac{1}{2}ad(c+d)+\frac{1}{2}bc(c+d)}{cd(c+d)}. $$

Therefore,

$$ Acc<BAcc $$

is equivalent to

$$ acd+bcd < \frac{1}{2}ad(c+d)+\frac{1}{2}bc(c+d), $$

which in turn is equivalent to

$$ acd+bcd < ad^2+bc^2. $$

Taking a look at all possible combinations of $a,b,c,d$ (the only restriction being that $a\leq c$ and $b\leq d$), we find that this is indeed very often the case:

maximum <- 5

for ( aa in 1:maximum ) {
    for ( bb in 1:maximum ) {
        for ( cc in aa:maximum ) {
            for ( dd in bb:maximum ) {
                if ( aa*cc*dd+bb*cc*dd < aa*dd^2+bb*cc^2 ) {
                    cat("aa=",aa,", bb=",bb,", cc=",cc,", dd=",dd,
                       " ==> Acc=",(aa+bb)/(cc+dd)," < ",
                       aa/(2*cc)+bb/(2*dd),"=BAcc\n",sep="")
                }
            }
        }
    }
}

yields

aa=1, bb=1, cc=1, dd=2 ==> Acc=0.6666667 < 0.75=BAcc
aa=1, bb=1, cc=1, dd=3 ==> Acc=0.5 < 0.6666667=BAcc
aa=1, bb=1, cc=1, dd=4 ==> Acc=0.4 < 0.625=BAcc
aa=1, bb=1, cc=1, dd=5 ==> Acc=0.3333333 < 0.6=BAcc
aa=1, bb=1, cc=2, dd=1 ==> Acc=0.6666667 < 0.75=BAcc
aa=1, bb=1, cc=2, dd=3 ==> Acc=0.4 < 0.4166667=BAcc
aa=1, bb=1, cc=2, dd=4 ==> Acc=0.3333333 < 0.375=BAcc
aa=1, bb=1, cc=2, dd=5 ==> Acc=0.2857143 < 0.35=BAcc
aa=1, bb=1, cc=3, dd=1 ==> Acc=0.5 < 0.6666667=BAcc
aa=1, bb=1, cc=3, dd=2 ==> Acc=0.4 < 0.4166667=BAcc
aa=1, bb=1, cc=3, dd=4 ==> Acc=0.2857143 < 0.2916667=BAcc
aa=1, bb=1, cc=3, dd=5 ==> Acc=0.25 < 0.2666667=BAcc
aa=1, bb=1, cc=4, dd=1 ==> Acc=0.4 < 0.625=BAcc
(...)

And neither accuracy nor balanced accuracy is a good measure for assessing classification models: Why is accuracy not the best measure for assessing classification models?

Stephan Kolassa
  • 95,027
  • 13
  • 197
  • 357