I have a nested-case control study that I have been using for analysis. At the end of my work I have deduced a set of variables that I use later to to classify new cases. One example of a simple classifier I am using is a naive Bayes, which will output simply a probability.
So here is my question:
Could I make my probabilities reflect the real world? In my specific example, the condition that I am testing for has a prevalence of 33% in my study, but a it has a population prevalence of only 10%. Bayes factors have been suggested to me as a way to achieve this, however I am little unsure how to set up the problem.
As an example I have seen a Bayes factor as a logit between the true vs. study prevalence of the outcome. The classifier however was a logistic regression, and in that case the Bayes factor was just added to the linear predictors. I think the example there was very specific, and perhaps an inappropriate method for probabilities of a naive Bayes. Instead what I did was add the logit Bayes factor to the logged probabilities, but I am also not convinced this is right either. I also think a simpler solution would be to use Bayes theorem directly, but there I am not sure how to represented my study vs.population prevalences. The method below isn't quite right, but gets at what I want:
p_final = classier_posterior*(population_prev)/(study_prev)
I should contextualize that I use the probabilities to establish a threshold for classification down stream.