I want to integrate the notion of weighting into an evaluation. I am wondering if it is appropriate/correct to calculate precision and recall scores by adding a weighting on true positives, false positive and false negatives. In my case, these have a ranking and/or associated value. For example:
rank value test1 test2 test3
1 99.3 x
correct 2 87.2 x x
3 66.9 x x
-
incorrect 4 33.1 x
5 12.8 x x
Values from ranks 1 to 3 are correct, while those in ranks 4 and 5 are incorrect. My question is - can we calculate TP, FP and FN by summing the ranks of the corresponding items for a given test result? For this toy example this would yield:
test1: TP = 87.2 + 66.9 = 154.1, FP = 12.8, FN = 99.3
precision = 154.1 / (154.1 + 12.8) = 0.92
recall = 154.1 / (154.1 + 99.3) = 0.61
f-score = 2 (0.92 * 0.61) / (0.92 + 0.61) = 0.73
test2: TP = 99.3 + 87.2 + 66.9 = 253.4, FP = 0, FN = 0
precision = 253.4 / (253.4 + 0) = 1.0
recall = 253.4 / (253.4 + 0) = 1.0
f-score = 2 (1.0 * 1.0) / (1.0 + 1.0) = 1.0
test3: TP = 0, FP = 33.1 + 12.8 = 45.9, FN = 99.3 + 87.2 + 66.9 = 253.4
precision = 0 / (0 + 45.9) = 0
recall = 0 / (0 + 253.4) = 0
f-score = 2 (0 * 0) / (0 + 0) = 0
Which all seems fine, but is it mathematically sound? Are there cases where this would fall apart or provide unreliable results?