Bayesian Updating for a Discrete Rating Value

Question

I have an item for which I slowly collect rating values on a website. It is a movie item on a website and at the beginning it has no rating but I assign it a Gaussian prior $N(\mu_0, \sigma_0^2)$. A person comes on the website and gives it 4 out of 5 (note that I have only discrete ratings 1, 2, 3, 4 and 5 stars)

My question is how can I update my prior now and get a posterior distribution for the mean of stars.

My second question is that let's say I consider the median of the scores as the final score that I want to show to people for that movie how can I do my updating then? I imagine I have to form a discrete prior as well.

And my last question is that if I wanted to assign a weight to each rating and find the weighted mean and median how can I do that?

These might be easy questions but I am very confused about these. If you know a book or paper that might have discussed a similar problem I appreciate if you could refer me to that too.

Would R code that does a fully Bayesian analysis for this be useful for you? Likely it would be based on direct simulation and not use any packages or involve MCMC. — phaneron, Nov 06 '12 at 13:35
I have done a bit more a would like feedback on how to make things clearer. Did you want me to tweet you? — phaneron, Nov 08 '12 at 22:06
Sorry I was busy and have not been able to check the code yet. Will do it during the weekend. Thanks a lot for suggestions — MarkSAlen, Nov 09 '12 at 16:51
as a general read for that type of problem I recommend _Bayesian Data Analysis_ by Gelman et al. — mlwida, Nov 11 '12 at 16:16

score 4 · Accepted Answer · edited Apr 13 '17 at 12:44

A general strategy for getting an initial grasp of Bayesian methods was suggested here The connection between Bayesian statistics and generative modeling and here Bayesian meta analysis of residual standard deviation using BUGS

The below is a sketch of that for this question - were I think it works quite well - at least for me;-)

# Set of values that may become known (the rating from 1:5)
Knowns<-1:5
# A probability generating model for one such known (all equal probability)
sample(1:5,size=1,prob=c(.2,.2,.2,.2,.2))
# The possible unknowns that need to be _used_ to generate one such possible known
# First an example
PossibleUnknown<-c(.2,.2,.2,.2,.2)
sample(1:5,size=1,prob=PossibleUnknown)
# Note the unknowns here are the probabilities of a rating of 1,2, ... 5 and the known is the first rating given

# Getting a probability distribution for the possible unknowns (a prior distribution)
# Not immediate because its has 5 elements in 4 dimensions (as the probabilities must sum to 1)
# Fortunately Bayesians have one we can start with
library(MCMCpack) # Just used to get the prior
(PossibleUnknown<-rdirichlet(1,rep(1,5) ))
sum(PossibleUnknown)

# OK now using the two stage conceptualiaztion of Bayes by Don Rubin 1984 it is direct and transparent

# Number of MC smaples to generate
reps<-1000000
# Sample from prior of Possible UnknownS
PossibleUnknowns<-rdirichlet(reps,rep(1,5))
# Sample rating from data generating model for each Possible Unknown above
PossibleKnowns<-apply(PossibleUnknowns,1,function(x) sample(Knowns,size=1,prob=x) )
PossibleJoints<-cbind(PossibleUnknowns,PossibleKnowns)
head(PossibleJoints)

# The sample from the Posterior if 1st rating is 5 
# (those Possible Unknowns that generated PossibleKnowns = 5)
ConditionalUnknowns<-PossibleUnknowns[PossibleKnowns==5,]

# Plot prior and posterior marginals (separate rating probabilities) to _see_ whats going on
par(mfrow=c(2,2))
for(i in 2:5) hist(PossibleUnknowns[,i],main=paste("Rating of",i),xlab="PossibleUnknown\n(probabilities)")
for(i in 2:5) hist(ConditionalUnknowns[,i],main=paste("Rating of",i),xlab="PossibleUnknown\n(probabilities)")

# Calculate marginal or expected probability for each rating separately
# Prior probabilities
apply(PossibleUnknowns,2,mean)
# Posterior probabilities
apply(ConditionalUnknowns,2,mean)
# A good guess at the prior probablities??
rep(1,5)/sum(rep(1,5))
# A good guess at the posterior probablities??
(rep(1,5) + c(0,0,0,0,1))/sum((rep(1,5) + c(0,0,0,0,1)))

# So if the next rating is a 5
(rep(1,5) + c(0,0,0,0,2))/sum((rep(1,5) + c(0,0,0,0,2)))
# Easy to directly check as for the second rating its prior is the posterior from 1st rating
PossibleUnknowns<-ConditionalUnknowns
PossibleKnowns<-apply(PossibleUnknowns,1,function(x) sample(Knowns,size=1,prob=x) )
ConditionalUnknowns<-PossibleUnknowns[PossibleKnowns==5,]
# Posterior probabilities
apply(ConditionalUnknowns,2,mean)
# Not bad
(rep(1,5) + c(0,0,0,0,2))/sum((rep(1,5) + c(0,0,0,0,2)))
# OK now read wiki dirichlet for the math
# Now the above was not just a warm exercise as for instance if you want to 
# use different priors, perahps an empirical prior based on past ratings of 
# films of the same genre - you now know how to do that! 

# Now the mean rating if you are interested in that is the sum of probability of rating * rating
# Prior mean rating 
sum( rep(1,5)/sum(rep(1,5)) * Knowns )
# Posterior mean rating given first rating was a 5
 sum( (rep(1,5) + c(0,0,0,0,1))/sum((rep(1,5) + c(0,0,0,0,1))) * Knowns )

when you type an answer, there is a symbol "{}" above the text area, which allows you to insert some formatting for the code parts. I really recommend to use it for the code and remove the "boldness" from the non-code parts. In the current version it hurts my eyes ;) — mlwida, Nov 07 '12 at 15:00
@ steffen Sorry, I did try that and could not get it to work nor find help about it and now I have to get back to _real_ work — phaneron, Nov 07 '12 at 15:26
select everything you want to format as code, **THEN** click on "{}" and the formatting is done. There is no need to be offensive. I hoped that the smilie is enough to explain that I was joking. — mlwida, Nov 07 '12 at 15:33

Bayesian Updating for a Discrete Rating Value

1 Answers1

Linked