I've got several confusion matrices, all of binary classification (negative, positive). I would like to get general scores of all the matrices combine. Problem is, that the data is not balanced at all. For example, lets look on the following 2 sets of values:
1. Number of positive examples: 2, number of negative examples: 298
----------+---------
| TN:290 | FP:8 |
--------------------
| FN:1 | TP:1 |
----------+---------
2. Number of positive examples: 46, number of negative example: 254
----------+----------
| TN:233 | FP:21 |
---------------------
| FN:20 | TP:26 |
----------+----------
So, taking an average of the metrics such as Precision, Recall etc. will not be a good representation. You can think of it as a one vs. all problem, changing the one each time and aggregating the result to get an overall performance, with consideration of the sample distribution.