The problem: infer the nationality of a person from a limited number of features (name, email, ...). I do not have enough "ground truth" to use ML techniques, I'd like to try what for a computer scientist is a "heuristic model" that is a model based on empirical evidence and domain experience (if your email is something like foo@bar.fr and your name is Philippe, you should be French).
My problem is: how do I assess the validity of this model? The question seems (and probably is) naive but I do not have a strong background in statistics and I'm used to the standard approaches used in ML (n-fold validations, etc...). I tried reading books and online resources but I'm more confused than I was before.
Intuitively I could take a random subset of the samples, manually assess the ground truth for this subset and compare it with the outcomes of the model. If the subset is not too small and I have a high overlapping in the resulsts (minus unbalancing and so on) that should give me confidence that the model is OK. Does it make sense? Do you have a reference showing how this kind of analisys should be performed? Thanks.