Very basic question about task

Question

As a machine learning super beginner I decided to play around with some programs that can perform action recognition task. I decided to give this one a try: x

1) I don't really understand why accuracy number for each iteration is so high. I decided to print out the values of TN, TP, FN and FP for diving action and here's what I got:

TP = 9

FP = 4

FN = 1

TN = 109

Why is there such a disparity between these numbers? I have to admit that I don't now much about programming in Python and can't fully grasp how the program exactly works.

score 2 · Accepted Answer · answered Aug 21 '19 at 15:35

Because most examples in your training set are not of the diving action. This leads to class imbalance, and accuracy is a mostly meaningless metric in the presence of significant class imbalance. Here is your confusion matrix:

                      P |   N | Total
------------------------+-----+------
Prediction Positive   9 |   4 |   13
Prediction Negative   1 | 109 |  110
------------------------+-----+------
              Total  10 | 113 |  123

Note that 113/123 of your actions are not diving, while only 10 are. That means that even if I just guessed "not diving" every time, I would be right 92% of the time! That means that 92% accuracy is the "floor" for any non-trivial model. A model would have to be truly broken to get less than that! Sure, your model gets 96% accuracy, but it's really only edging out the constant model by 4%. The moral of the story is that accuracy is basically the wrong metric to look at in the presence of class imbalance, not only because it will be inflated, but also because it will be inflated by a different amount depending on prevalence, making it very difficult to compare performance across difference classes (actions in your case.)

AUC ROC is a widely used metric in these cases, and one of the reasons is that it is insensitive to changes in class prevalence. Another metric (less commonly used but closer in spirit to accuracy) is Matthews correlation coefficient. The $F_\beta$ score (for a suitable choice of $\beta$ that incorporates your own priorities) is also a good choice. Any of these should be more useful and easier to interpret than accuracy for your case.

Most of the criticisms at [Why is accuracy not the best measure for assessing classification models?](https://stats.stackexchange.com/q/312780/1352) also apply to AUROC and to $F_\beta$. — Stephan Kolassa, Aug 21 '19 at 15:38

Very basic question about task

1 Answers1