Sorting products by reviews considering the number of reviews

Question

I have a big amount of products and a lot of reviews for these products, including a rating of the product.

My problem is that every product does not have the same amount of reviews. For example, one product can have 125 reviews with an average 4.2/5 ratings, whereas another one may have 1 review of 5/5.

Is there any model or algorithm that can sort my products by best products, but considering the amount of reviews ?

It is maybe a simple question, but I've never seen it posted this way. I suppose it is a common problem but I did not find a right title to search for solutions.

Example (asked by RayVelcoro)

Let's say I have 2 movies : A - rated 4.3/5 - 152 reviews B - rated 5/5 - 2 reviews

Since B have been rated only two times, I cannot be sure that B is better than A : Maybe, when B will have more than 50 reviews, it will be rated only 3.8/5

How can I take that in account in a search that sorts results by best movie ?

Without an example, it is hard, but I think this would be a good use for the [aggregate](https://stat.ethz.ch/R-manual/R-patched/library/stats/html/aggregate.html) function in R. — RayVelcoro, Sep 22 '15 at 16:06
I have added an example to the question above. Is it more understandable that way ? — haverchuck, Sep 22 '15 at 16:13
On an unrelated note, are you a Haverford student/alumnus? (if you're comfortable answering) — jlimahaverford, Sep 22 '15 at 16:19
@jlimahaverford no, my name is a reference to the character of freaks and geeks — haverchuck, Sep 22 '15 at 21:12

score 1 · Accepted Answer · answered Sep 22 '15 at 16:17

I'll start with a very simple suggestion, add $5$ $3$-star reviews to each product. In your example above we would then be comparing an average of $4.15$ to an average of $3.33$. This starts every product at $3$ and requires some data to move away from $3$.

While this seems like a silly idea with no mathematical justification, it actually is not. The idea of pseudo-counts, are derived from Bayesian models whose posterior means can be computed simply by adding a few fictional data points. I have not formulated a specific prior here because I know very little about distributions on ordinal data such as yours, but the idea still works.

This is a form of regularization or shrinkage and these effects can be achieved in many ways, and in models of various levels of complexity.

If you like the simplicity of pseudo-counts but want to be a bit more rigorous you can try adding $n$ reviews with an average of $\mu$ to each product and tune $n, \mu$ using cross validation to maximize some measurement of prediction accuracy. I would start with $\mu$ as the average rating in your whole data set, and play with values of $n$.

thanks ! that seems to be a good solution ! I just have one more question : do you take μ as the average rating in the data set of reviews, or in the data set of products ? I mean, do you weight the rating of a products with the number of reviews of this product or every product rating has the same weight in μ ? — haverchuck, Sep 22 '15 at 21:19
(sum of all reviews / number of all reviews) was what I was suggesting. — jlimahaverford, Sep 22 '15 at 21:20

score 0 · Answer 2 · answered Sep 22 '15 at 16:29

This looks like a Bayesian Hierarchical Model waiting to happen.

Fit an unknown parameter to each movie/product that links to the data through a multinomial or ordered logit, or possibly even just a simple normal. Then put a prior on the parameters and fit the model. Movies/products with a large number of ratings will have a parameter that is mainly determined by the data (pulled a little ways towards the mean by the prior). Movies/products with very few ratings will be highly influenced by the overall mean and a little by the corresponding ratings (how much depends on the hyper prior). So a product/movie with 2 ratings of 5 out of 5 will be pulled towards the mean (but still above one with 2 ratings that were 1 out of 5 or 3 out of 5). Sort on the mean (or median/mode, etc.) of the posteriors for the movie/product parameters and that should give you what you are looking for.

Sorting products by reviews considering the number of reviews

2 Answers2

Linked