I've read How Not to Sort By Average Rating regarding how to average binary positive/negative ratings in a way that takes the number of ratings into account. The author uses the "lower bound of Wilson score confidence interval for a Bernoulli parameter". However, the items I'm dealing with have continuous ratings from 0 to 1. Is there an analogous averaging technique for this case?
My ratings collection is long-tailed: the median item has only 2 ratings, but the average one has 80, and the most-rated item has 36,000 ratings. Intuitively ten ratings of 0.8 should "average" higher than one of 0.9, but I'd like a precise formulation of this intuition.
(I'm using this to design a recommender system, which has to deal with 50,000 users and 10,000 items. I'm evaluating various known recommenders, like GroupLens and LSI, and have to design one that doesn't perform too much worse than those (and hopefully better). I was reminded of this blog post on averages when using users' average ratings for a baseline RMSE calculation.)