1

I've got several confusion matrices, all of binary classification (negative, positive). I would like to get general scores of all the matrices combine. Problem is, that the data is not balanced at all. For example, lets look on the following 2 sets of values:

1. Number of positive examples: 2, number of negative examples: 298

    ----------+---------
    | TN:290  |  FP:8  |
    --------------------
    | FN:1    |  TP:1  |
    ----------+---------
2. Number of positive examples: 46, number of negative example: 254

    ----------+----------
    | TN:233  |  FP:21  |
    ---------------------
    | FN:20   |  TP:26  |
    ----------+----------

So, taking an average of the metrics such as Precision, Recall etc. will not be a good representation. You can think of it as a one vs. all problem, changing the one each time and aggregating the result to get an overall performance, with consideration of the sample distribution.

M.F
  • 11
  • 4
  • It is all right to average Precision or Recall obtained from several 2x2 confusion matrices (binary classification, one matrix corresponds to classification to one focal class). And you could do a weighted averaging. For example, a weight can be the frequency or the inverted frequency in the focal class. – ttnphns Jan 11 '22 at 13:09
  • How can a simple average be enough here, when in one focal class we only have 2 positive samples and in the other we have 46 (meaning, the later is a lot more statistically correct if you may) ? By frequency you mean the rate of positive samples? – M.F Jan 11 '22 at 13:20
  • Your problem may not so much lie in how to aggregate precision and recall, but in the fact that precision and recall are inherently problematic. This thread is about accuracy, but it applies equally to precision and recall: [Why is accuracy not the best measure for assessing classification models?](https://stats.stackexchange.com/q/312780/1352) – Stephan Kolassa Jan 11 '22 at 16:17

0 Answers0