I had an individual take a test with 500 binary questions before and after an intervention. Let's say they scored an 80% initially and a 90% on the second pass through.
I can measure the absolute change in accuracy (in the example 10%), but how do I construct a confidence interval for this change in accuracy?
Would the right step be bootstrap in pairs (i.e. bootstrap on the question number and include both the before and after) to get some large number of samples for the accuracy metric? If so, how many samples would be appropriate?