4

This is my first post here and I'm new to this area so please forgive me if I'm asking a naive question.

I want to rank a number of batsmen (e.g., in cricket) by their skill. I'm planning to use batting averages as a measure of skill.

But there are some batsmen who have only been to bat a few times and therefore their average is not a good estimate of performance (I.e., I don't have a very good estimate of their "true" average).

Assuming that, for each batsman, I only have scores of each time he came to bat (e.g., 89 4, 92, 45, ... ).

How could I best sort them?

Thanks in advance.

whuber
  • 281,159
  • 54
  • 637
  • 1,101
Akanes
  • 143
  • 3

1 Answers1

3

"Best" is impossible to answer since you don't give a criterion you want to optimize (that is, until you define pretty precisely what 'better' is for your purpose, 'best' isn't really possible to judge).

Trying to rank things with very different numbers of (/ precision of) observations is a common problem (think online ratings for movies for example), so you might search around for solutions to quite different problems with the same issue.

I'll present one approach I've seen used:

Take the observed value as as estimate of a population quantity (the batter's true ability*) and so calculate a one-sided interval for the quantity (a lower bound), and then order by that. You could call it an "experience-adjusted average" (or more accurately, inexperience-adjusted), which downrates the average more for less experience.

* This assumes we're not dealing with a shifting target -- if we are (and you'd expect with batting in any sport that it would shift over time), then you might weight by recency as well, such as an exponentially weighted moving average. This will still work with the above suggested approach.

Whether this suits your purpose or not, I can't say, but with a reasonably chosen interval it often doesn't perform too badly (e.g. perhaps an 75% or an 80% interval might work okay; you can try other values).

Glen_b
  • 257,508
  • 32
  • 553
  • 939
  • I was thinking about asking a similar question at some point. This is what I currently do in tables as well, but I was wondering if there is a quick and dirty way to do empirical Bayes shrinkage with rates. (I will have to re-read those Efron and Morris papers on the James-Stein estimator more closely.) – Andy W Mar 13 '15 at 12:41
  • @AndyW It sounds like it might make for a better answer than this one. Actually I think a Bayesian approach to this problem would probably make more intuitive sense, so perhaps it might also be better in terms of explaining it to other people. – Glen_b Mar 13 '15 at 13:46
  • Glen_b and Andy W, thanks for your answers. In my problem we can assume a static target per batsmen (i.e., the skill I'm ranking them on does not vary over time). @Glen_b, do you have any pointers to one of the movie ranking methods you mentioned in your answer? – Akanes Mar 13 '15 at 15:48
  • @Akanes - the first time I saw this referenced was by [Evan Miller](http://www.evanmiller.org/how-not-to-sort-by-average-rating.html) for online product reviews with +/- votes. It is such a simple concept though I'm sure it has been suggested multiple times independently. (The idea extends to really any numeric value, not just rates.) – Andy W Mar 13 '15 at 16:21