Let's say there are 600 games in an NBA season, and a competition among individuals to correctly forecast the winners of the games. The forecasts are not expressed probabilistically but simply in terms of raw outcomes (e.g. "Bulls will beat Spurs in Round 1").
Person A forecasts the winner in 450 games (75% correct), while Person B forecasts the winner in 400 (66% correct).
I'm interested in placing some kind of confidence interval around that 450, with a view to quantifying the distinction between Person A and Person B and establishing whether we have good grounds for saying Person A is more skilled than Person B (and Person C, and others not mentioned), or merely more lucky. I'm ultimately interested in putting all the people on a chart with some sort of confidence interval (or maybe credible interval) around each person.
I'm unsure how to proceed in relation to this. However, I will briefly explain an answer that occurred to me, after I found the search term binomial proportion confidence interval. As I understand from that Wikipedia page and from another answer, the normal approximation CI for a proportion is
$\hat{p} \pm z \times \sqrt \frac{\hat{p}(1-\hat{p})}{n}$
So, I thought the right way forward might be to go
$0.75 \pm 1.96 \times \sqrt \frac{0.75(1-0.75)}{600}$
$= 0.75 \pm 1.96 \times \sqrt \frac{0.1875}{600}$
$= 0.75 \pm 1.96 \times \sqrt {0.0003125}$
$= 0.75 \pm 1.96 \times 0.0177$
$= 0.75 \pm 0.034692$
Thus meaning we can place a confidence interval (0.715308, 0.784692) around Person A's proportion correct, which was 0.75. This confidence interval excludes Person B's proportion correct, which was 0.66.
Then we could scale things up to the sum correct by multiplying by 600, thus getting a confidence interval (429.1848,470.8152) around Person A's raw score of 450.
Is this a sensible way to proceed? Should I be approaching the issue using bootstrapping, or using some other method?
There's a further potential complication which is that Person A and Person B (and others) have been playing this game for many years, and thus I've accumulated yearly totals for them that span across multiple years. I'm unsure if I should be considering some sort of 'rolling' confidence interval over the years, or treating each year as separate.