The question makes good sense. It is specifically noted that the contingency table is a result of cross-validation. Witten et al's Data Mining book (based around Weka) discusses a modified T-Test for (Repeated) Cross-Validation. A T-Test implicitly defines a confidence interval. Given we have a CV and each cell is an averaged statistic, CIs do exist per cell, although they will be most commonly calculated for the marginal statistics, and directly or via those for whole of table statistics.
In the following paper I explore adaptation of various generalizations of the confidence intervals applied to correlation to useful multiclass cases, and validate with monte carlo simulation, but it is difficult give a clear recommendation as the same measure can be overly conservative in some cases, and insufficiently conservative in others, nonetheless a reasonably choice is suggested and illustrated in simulations across a range of parameterizations:
Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation
DMW Powers
International Journal of Machine Learning Technology 2 (1), 37-63
It is possible to calculate recall and precision from a contingency table (divide a diagonal entry by the appropriate marginal sum) and their inverses (by the complement in the diagonal vs the margins - or simply to convert to binary tables) and define confidence intervals based on the Wald or Wilson techniques. A useful rule of thumb is introduced by Agresti et al for the normal distribution assumption at alpha=0.05, which is to add 2 positive and 2 negative examples. Tony Cai shows this is appropriate for the Binomial distribution and gives modified versions for the Negative Binomial (not applicable here) and the Poisson (arguably applicable, and used as an assumption in some of my derivations above).
The Poisson modification is probably most applicable here (think in terms of when another PpRr Predicted/Real pair might arrive) as it is focussed only on the class of interest and doesn't distributed errors/negativity amongst the other classes. It adds two more arrivals to a cell before calculating the statistics relating to that cell.
Wei Pan (2001) derives some other possible measures based around the binomial distribution and the T-test.
The Cai paper is here:
http://www-stat.wharton.upenn.edu/~tcai/paper/Plugin-Exp-CI.pdf
My paper is here:
http://dspace2.flinders.edu.au/xmlui/bitstream/handle/2328/27165/Powers%20Evaluation.pdf
This is something I'm still exploring - hence returning periodically to see if there's any new/useful contributions on the topic...