0

I was searching for a metric to do this for a while and still could not find. More specifically, my problem is as follows.

I have a ranked golden corpus. For example, consider that it looks as follows.

name, rank
n1   , 1
n2   , 2
n3   , 3 
n4   , 4
n5  , 5
and so on  

Suppose, I have the top 5 results of two recommendation methods as follows.

model1 = [n18, n12, e1, n19, n11]
model2 = [n1, n2, e1, n3, e3]

If we consider precision@5 of model1 it is 0.8 (i.e. e1 is the only wrong one and others are in the golden corpus). P@5 of model2 is 0.6 (i.e. e1,e3 are incorrect).

However, if we look closely, model2 outperforms as it tend to have the highly ranked elements in the golden corpus as its top recommendations compared to model1.

I looked for measures such as precision@k, mean average preciosn@k and none of them seem to support the point that I try to validate.

Please let me know if there is a suitable way to solve this issue.

I am happy to provide more details if needed.

EmJ
  • 592
  • 3
  • 15
  • 1
    Check https://stats.stackexchange.com/questions/159657/metrics-for-evaluating-ranking-algorithms – Tim Nov 29 '19 at 07:25
  • @Tim this is very helpful. I think NDCG suits my task. thank you :) – EmJ Dec 01 '19 at 00:17

0 Answers0