Handling incomplete information properly when using Bayes' theorem

Question

Just for fun I'm trying to do some simple medical diagnosis using Bayes' theorem. Right now I'm calculating

P(condition | symptoms) = P(symptoms | condition) * P(condition)

for each possibly condition, then choosing the most likely condition given the present symptoms as the "diagnosis" (note that for simplicity's sake I assume that the symptoms are independent). This works well when I have a complete list of the probabilities P(symptom | condition) for all symptoms and conditions.

However, I want to do better in the case where I do not know how likely each symptom is to occur as part of every disease. Let's say, for example, that I have a "patient" with a long list of symptoms, and two possible conditions A and B. For condition A, I have a full list of the symptoms and their probabilities, while for condition B I only know the five most common symptoms. To calculate P(condition B | symptoms) my current solution is to set P(symptom | condition B) to some base rate, e.g. 0.01 both when I know for sure that the symptom is never caused by condition B and when I don't know the real rate of the symptom under condition B.

This leads to problems since condition A will often end up as the "diagnosis" even if every P(symptom | condition A) is low, if the number of known symptom probabilities given condition A is higher than the number of known probabilities given condition B.

What is the best way to properly handle this uncertainty and solve the problem presented above?

Handling incomplete information properly when using Bayes' theorem

0 Answers0