What is the standard metric used in recommendation systems to evaluate the rankings?

Question

I was searching for a metric to do this for a while and still could not find. More specifically, my problem is as follows.

I have a ranked golden corpus. For example, consider that it looks as follows.

name, rank
n1   , 1
n2   , 2
n3   , 3 
n4   , 4
n5  , 5
and so on

Suppose, I have the top 5 results of two recommendation methods as follows.

model1 = [n18, n12, e1, n19, n11]
model2 = [n1, n2, e1, n3, e3]

If we consider precision@5 of model1 it is 0.8 (i.e. e1 is the only wrong one and others are in the golden corpus). P@5 of model2 is 0.6 (i.e. e1,e3 are incorrect).

However, if we look closely, model2 outperforms as it tend to have the highly ranked elements in the golden corpus as its top recommendations compared to model1.

I looked for measures such as precision@k, mean average preciosn@k and none of them seem to support the point that I try to validate.

Please let me know if there is a suitable way to solve this issue.

I am happy to provide more details if needed.

Check https://stats.stackexchange.com/questions/159657/metrics-for-evaluating-ranking-algorithms — Tim, Nov 29 '19 at 07:25
@Tim this is very helpful. I think NDCG suits my task. thank you :) — EmJ, Dec 01 '19 at 00:17

What is the standard metric used in recommendation systems to evaluate the rankings?

0 Answers0