General Problem Description and Goal
I set up a matching algorithm that matches a user input (string) with a list of possible values (words) that is conclusive but very large (some 5 digit possibilities) and I want to evaluate the quality in terms of accuracy of the algorithm and give a significance but not compare it to any other algorithm. The problem came up since it is not possible to restrict the user input, as it is pulled from a website not operated by myself and the user input is partly incorrect e.g. due to typos but I know all possible meaningful inputs. The possible inputs are not perfectly uniformly distributed as some words appear more often than others, nevertheless, there is no predominant class.
Approach
Inspired by this site I thought of simply using the binomial CDF, as I think, that the results of my matching algorithm are bernoulli distributed: X_i = 1 if correct and X_i = 0 if false. Therefore I draw a randomized sample of 1000 results and manually inspected if they are correct. That resulted in 15 false matchings and 985 correct matchings. Then I calculated the CDF for k = 985, n = 1000 and p = 0,95, implying that my algorithm is 95% correct. That gave me a value of 0.99999999905952 which would allow me to discard H_0 that my algorithm is worse than 95% correct matchings at the 1% significance level.
Problem and Question
Is that a common and acceptable approach? If not, what kind of test should i consider?
Are there any publications which are a good source for that approach, as I only found the above linked website?