Although I think my question might be a specific case of a generic problem, I'll include some background info that might be relevant. There is a sort of concierge/delivery service I know that uses a model fairly similar to Uber. As with Uber it uses an anonymous, five-star rating system (no half stars) for feedback. While Uber lets its drivers rate passengers this service does not currently allow any kind of feedback on the customers by its workers.
As you can imagine there is a lot of angst regarding bad ratings. I don't know how common it is but a rating of three or lower results in that worker no longer being allowed to pick up jobs for that customer. I don't believe that the customer is necessarily ever aware that selecting the midpoint response has that severe consequence. Another point of some frustration is that the customers are forced to leave feedback, if they hadn't already done so, at the point they want to use the service again. And finally another possible bias until recently was that customers couldn't leave any further feedback in tandem with a five-star rating; all others were prompted for more details.
Here's another point that's possibly significant: A customer can actually change a worker's rating at any time and so it appears to me that ratings are not intended as a measure of performance per job but rather as the worker overall. Though I'm very curious to see for myself the interface of the mobile app that collects this information I just haven't had the opportunity yet.
These figures are used to calculate a rolling average and in this case that average drives a ranking used to determine the order that workers on shift get a notification, and thus opportunities, to snag new jobs. Workers can have multiple territories and as another twist I believe the ratings generally apply per territory, at least in the first stage of ranking. In many cases the sample could be small or non-existent. The ranking is never explicitly revealed and apparently the formula also involves a few other stats and all-time fallbacks/tie-breakers.
Let's mostly ignore that and focus on the big rating average. There is a lot of variance in the number of completed jobs across the workers. At a given time there are often only a handful of workers active in each territory. New workers initially get an artificial boost to their scores so they can get in the game.
Hopefully this is enough information to make some broad generalizations about it. I know that there are problems trying to use a Likert scale to generate and interpret averages and I've read this question in particular. (Is this in fact a Likert scale or just another kind of ordinal scale?) I already liked the idea of adjusting the customers' scores as suggested there. But ultimately I am very much a novice in this field and I want to know where this performance metric sits on a spectrum between perfectly reasonable and entirely unfair. Are there any factors here that make this whole system a bad idea?