I was searching for a metric to do this for a while and still could not find. More specifically, my problem is as follows.
I have a ranked golden corpus. For example, consider that it looks as follows.
name, rank
n1 , 1
n2 , 2
n3 , 3
n4 , 4
n5 , 5
and so on
Suppose, I have the top 5 results of two recommendation methods as follows.
model1 = [n18, n12, e1, n19, n11]
model2 = [n1, n2, e1, n3, e3]
If we consider precision@5 of model1
it is 0.8 (i.e. e1
is the only wrong one and others are in the golden corpus). P@5 of model2
is 0.6 (i.e. e1,e3
are incorrect).
However, if we look closely, model2
outperforms as it tend to have the highly ranked elements in the golden corpus as its top recommendations compared to model1
.
I looked for measures such as precision@k, mean average preciosn@k and none of them seem to support the point that I try to validate.
Please let me know if there is a suitable way to solve this issue.
I am happy to provide more details if needed.