I am calculating 'popularity' scores for content in a web app based on 'views' and 'likes'.
I have not studied statistics but I have found the method I need to use here:
http://www.evanmiller.org/how-not-to-sort-by-average-rating.html
Score = Lower bound of Wilson score confidence interval for a Bernoulli parameter
In the article it explains the choice of z
value thusly:
"confidence refers to the statistical confidence level: pick 0.95 to have a 95% chance that your lower bound is correct"
I don't understand what 'correct' means here, and why I would choose lower or higher confidence levels. Why not choose 100% confidence and get a 'correct' result?
I have found several other questions with answers which get heavily philosophical and technical and I'm not clear how they relate to my case: What does a confidence interval (vs. a credible interval) actually express? What, precisely, is a confidence interval? Are there any examples where Bayesian credible intervals are obviously inferior to frequentist confidence intervals
I have applied the formula and calculated the scores for my data, my question is: why I would choose lower or higher confidence levels (and what does that mean for my score)?
Update
I have somewhat answered my question by experimenting with different 'confidence' z
values and looking at the scores generated:
Likes | Views | z | Score
--------------------------------------------------
1 | 4 | 1.0 | 0.1
100 | 400 | 1.0 | 0.2289908334502525
--------------------------------------------------
1 | 4 | 1.96 | 0.045586062644636216
100 | 400 | 1.96 | 0.21007832849376823
--------------------------------------------------
1 | 4 | 2.58 | 0.029987372072017595
100 | 400 | 2.58 | 0.19854163422270693
So I can see from this that choosing a higher z
value 'confidence level' is assigning a relatively lower score to the item with few views ('total votes', according to the formulation of original article).
I take this to mean that for items with few views we have a lower 'confidence' that the current known ratio is representative of the unknown 'true' ratio that would emerge if we had more data.
In comments @whuber has suggested that:
"confidence limits are not likely to be a part of an accurate solution"
So my question now is... is there a better formula I should be using to calculate the 'popularity' score for my data set?