4

I've got product ratings for a few thousand products. The number of ratings for each product varies from zero to about fifty. I want to find the expected value of product rating for each product. If there are lots of ratings for the product I'd expect the expected value to be the average of the ratings for the product, but if there are only a few I'd expect the expected value to be closer to the average of all ratings. How do I calculate the true expected value? Please be gentle: I'm no statistician or mathematician.

Edit 1: Joris's answer below maintains I can't calculate expected value because by definition that means I must have the entire population. In that case please can you tell me how to calculate the quantity that is similar to expected value in spirit, does not require the entire population, and can make use of prior information.

Edit 2: I would expect that if each product's ratings have low variance ratings, or if there is a very high variance between different products' ratings, then the measured ratings are more significant.

naught101
  • 4,973
  • 1
  • 51
  • 85
bart
  • 313
  • 1
  • 8
  • although not the same, you may want to see this related post: http://stats.stackexchange.com/questions/1848/bayesian-rating-system-with-multiple-categories-for-each-rating – Jeromy Anglim Sep 01 '10 at 11:55

4 Answers4

4

Incorporating a prior is one way to 'make up' for small samples. Another is to use a mixed model, with an intercept for the mean structure and a random intercept for each product. The estimate of the population mean plus the predicted random effect (BLUP) then offers a form of shrinkage, where values for products with less information are shrunk more toward the overall sample mean than those based on more information. This method is common in, for example, Small Area Estimation in survey sampling.

Edit: The R code might look like:

library(nlme)
f <- lme(score ~ 1, data = yourData, random = ~1|product)
p <- predict(f)

If you go this route the assumptions are:

  • independent, normal errors with expected value 0 and constant variance for all observations
  • normal random effects with expected value 0

Violations of these can generally be modeled, but of course with that comes added complexity...

Kingsford Jones
  • 732
  • 6
  • 7
  • Thanks. I've looked at some of your suggestions and they seem to be along the lines I was thinking. But can you recommend anything more specific and suitable for the lay person that I am. – bart Sep 01 '10 at 18:56
  • 1
    @bart -- I wish I could offer a safe automated way to do this but I don't think it exits. See above for edits w/ a little more detail. – Kingsford Jones Sep 01 '10 at 22:54
  • You forget that some products don't have a score, or only very few. Lme is very likely to get convergence problems, and the estimates of the standard errors cannot be trusted. – Joris Meys Sep 02 '10 at 11:00
  • @Joris-Without a prior you can't include products with 0 information, but because there are thousands of products, having few observations per product is not a problem (even many with 1 obs will be OK). But still, without being familiar with the data, the context of the problem, and desired output/inferences we are only speculating as to what is appropriate. – Kingsford Jones Sep 02 '10 at 15:44
3

The "true" expected value cannot be calculated. You can estimate it using the mean of the ratings for each product, and get an idea about the position by calculating the 95% confidence interval (CI) on the mean.

This is done by

$CI \approx avg \pm 2 * \frac{SD}{\sqrt{n}}$

with n being the number of ratings, SD the standard deviation and avg the average. More correct would be to use the T-distribution, where you use the 2.5% and 97.5% quantile of the T-distribution with degrees of freedom equal to number of observations minus one.

$CI = avg \pm T_{(p=0.975,df=n-1)} * \frac{SD}{\sqrt{n}}$

For 10 ratings, $T_{(p=0.975,df=n-1)}$ is 2.26. For 50 ratings, it is 2.01.

There's a chance of 95% this confidence interval contains the true value. Or, to please Nèstor: if you do this experiment 10,000 times, 95% of the confidence intervals you construct this way will contain the true value for the expected value.

You assume here that the distribution of the average is normal. If you have a very low amount of ratings, the SD can be estimated wrongly.

In that case, you could estimate an "overall" standard deviation on the scoring, and use that to calculate the CI. But keep in mind that this way you assume that the standard deviation is the same for every product.

In extremis, you could resort to bootstrapping to calculate the CI for every product. This will increase the calculation time substantially, and won't be adding any value for products with enough ratings.

Joris Meys
  • 5,475
  • 2
  • 32
  • 43
  • As a newbie student maybe I shouldn't be too picky, but I'm not very happy with this answer. Why can't an ev be calculated? Can you prove that statement? Also 95% CI seems plain arbitrary. – bart Sep 01 '10 at 12:57
  • E[X] is a property of the pop that you are drawing inferences about. x-bar is an unbiased estimator of that value. You don't have to use 95% -- adjust t accordingly. But note the SE of x-bar is s/sqrt(n) not s/n. I'll see if I can edit this. – Kingsford Jones Sep 01 '10 at 15:31
  • @Bart : an expected value is about the population, and a theoretical value. It can be seen as the limit of the sample mean when the sample size goes to infinity. You need to estimate it using the mean, but you can't calculate it unless you know the complete population. Which you don't. See : http://en.wikipedia.org/wiki/Expected_value – Joris Meys Sep 01 '10 at 15:40
  • @Joris The wiki entry is much too advanced for me. Maybe I'm not asking for the right quantity. If by definition you need the entire population to calculate ev, I'd like a similar quantity that allows me to incorporate priors when my data is incomplete. So I can update the question, what would I call that? – bart Sep 01 '10 at 16:22
  • 2
    @bart, think of this question: what is the expected value of the height of an American? The only way to know this value exactly would be to ask every single person his height and take the average. That's the expected value---it's a property of the full population. That's really hard. Instead, we get a sample, a subset of our population, ask them their heights and take the average. This is the sample mean---a property of our sample and an estimate of the expected value. As our sample gets larger, we should get a sample mean that gets closer to the true expected value (the law of large numbers). – Charlie Sep 01 '10 at 19:03
  • @Bart : You want to estimate the expected value, and you do that by modelling the sample means in any way. It sounds trivial, but it is crucial to understand what exactly you're doing. I see far too many students struggle with statistics because they don't understand the underlying concepts. I don't want to be rude, but it might be a good idea to look for a thorough introduction into statistics. It will help you tremendously in understanding what is going on. – Joris Meys Sep 02 '10 at 11:09
  • @Joris You are not being rude. I've tried studying maths and stats in a conventional way and I have to say it hasn't been very successful. What seems to work for me is to reason the problems out in my own special way and then see how that matches up with what others do. – bart Sep 02 '10 at 17:07
  • @Joris How do I calculate the ev from the CI? How does this approach make use of the prior information provided by the other products? – bart Sep 02 '10 at 19:06
  • @Bart. The expected value is a theoretical concept. Statistics is used to approximate it. If you can calculate the true expected value, there is no need for a confidence/credibility interval any more, because (Frequentist) the chance the true expected value is the true expected value, is 1, or (Bayesian) your true expected value is not a random variable. I gave you the frequentist approach, no strict priors involved. You use the information of the other products by calculating a common standard deviation. So if you want a Bayesian approach, my answer is indeed not what you're looking for. – Joris Meys Sep 03 '10 at 11:50
  • *"There's a chance of 95% this confidence interval contains the true expected value".* **NO**. The confidence interval is one of many intervals which, on average, contain the true value 95% of the time. In plain english: if you repeat the experiment of sampling $n$ values from this normal distribution many, many times, each time you'll get a different confidence interval; 95% of them will contain the true value (for a 95% confidence interval). – Néstor Jul 10 '12 at 01:29
  • @Néstor As in: Do this 10,000 times and 95% of the confidence intervals you construct, will contain the true expected value. And since you don't know which of these CI you have, you have a probability of 95% that your confidence interval contains the true value. Delete the expected, that's a typo. – Joris Meys Jul 10 '12 at 08:20
  • @JorisMeys No. The calculated C.I. doesn't have a probability associated to have the "true parameter" (in a frequentist setting) because it is either in your interval or not. Please read this post for a further development of this point: http://stats.stackexchange.com/questions/11609/clarification-on-interpreting-confidence-intervals – Néstor Jul 10 '12 at 16:07
  • @Néstor I've had this semantic discussion a hundred times over, mostly with bayesians. As long as you don't know the true value, you have a probability. Once you know it, the CI renders itself useless. As long as the winning numbers of the lottery aren't known, you can talk about your chance to win the lottery. Once you know the winning numbers, you either won or you didn't. So as long as you can't tell me for sure whether or not the true value is contained in my calculated CI, I can only talk about the probability that it's in that calculated CI. YMMV. – Joris Meys Jul 10 '12 at 16:13
  • @JorisMeys I'm ok with that. The only problem: you can't talk about a 95% probability that the "true parameter" lies in a 95% confidence interval. There's is a probability, of course, but in general it is not 95%. Please read the post (we'll avoid coming up with the same arguments as there!). – Néstor Jul 10 '12 at 16:23
  • @Néstor reformulate as 'the probability that the 95% confidence interval contains the true value', which is, all assumptions taken into account and defining probability as the 'relative frequency of occurrence' or 'propensity', 0.95. Don't forget as well that probability is defined differently in frequentist and bayesian theories. BTW, I've read the whole discussion, but if I remember correctly it was you who brought the arguments given there to this discussion... – Joris Meys Jul 11 '12 at 17:40
0

I haven't looked into it much, but this article on Bayesian rating systems looks interesting.

Jeromy Anglim
  • 42,044
  • 23
  • 146
  • 250
  • Thanks for the link. I agree that the number of ratings for the product compared to average number for all products is significant. But the variance must be significant too. Suppose every product had zero variance in its ratings, then a single product rating would be sufficient to provide the ev. – bart Sep 01 '10 at 12:50
  • Good point. I just wanted to flag that the Bayesian option seems like the way to go. Of course @Kingsford has now provided a more rigorous explanation. – Jeromy Anglim Sep 02 '10 at 08:49
0

Ha! I've answered my own question. Simon Funk figured this out for the Netflix challenge here. See the paragraph commencing "However, even this isn't quite as simple as it appears". But I'm having difficulty proving it algebraically: maybe you guys would like to take that on.

bart
  • 313
  • 1
  • 8
  • 1
    You should delete this 'answer' and add it to your question as you have not really found the answer as evident by the last line. –  Sep 02 '10 at 17:48
  • Nice find, but when Simon writes "view that single observation as a draw from a true probability distribution who's average you want...and you can view that true average itself as having been drawn from a probability distribution of averages" he is describing a mixed or multilevel model. I don't think there's a need to guess that "K=25 seems to work well" because the best linear unbiased predictor equations were worked out more than 60 years ago (BLUP). Excellent Bayesian estimators exist as well. As Brad Efron said, "Those who ignore Statistics are condemned to reinvent it" – Kingsford Jones Sep 02 '10 at 18:00
  • @Kingsford What's worong with K = Vb/Va? – bart Sep 02 '10 at 18:57
  • @Srikant It's the best "answer" I've got so far. I think I'll be trying it out because: 1. As currently stated Jorly's doesn't actually provide the ev, and there is no indication of how prior information is utilised. 2. Kingsford's answer could suffer convergence problems apparently, and there is a lot of work for me to figure out what it all means. 3. Simon's solution worked well for him, and I trust Simon. 4. I think in time I'll get a proof of his method 5. The method uses all the available data and is correct at the extrema when individual variance is small and large – bart Sep 02 '10 at 19:28
  • @bart - I don't think you'll run into convergence problems, but you're right that it takes awhile to learn the methods. As for what's wrong with Vb/Va, I'm not sure how to answer because it's not clear to me how those values are being calculated. But under the nested Gaussian assumption described, one good way to define 'best' is in terms of minimizing mean squared error (MSE). This is what the BLUP equations do. See, for example, [here](http://tiny.cc/4zfl1). – Kingsford Jones Sep 02 '10 at 21:00
  • @bart It is not driven by sound statistical principles as is evident by the hand waving he does. If you want a statistically sound solution you should really explore multi-level models that Kingsford refers to in his comments. See my [answer](http://stats.stackexchange.com/questions/1822/test-for-poolability-of-individual-data-series/1827#1827) to another question which illustrates the basic idea. –  Sep 03 '10 at 00:22