8

What is the likelihood of being infected with covid-19 if I have 3 of the main symptoms? I'm trying to formulate this as a bayesian statistics problem, would appreciate any suggestions.

For example the symptoms can be fever, muscle ache, diarrhea. This table from a 2020 study shows the percentage of infected patients (total 99) that show a particular symptom

Fever                (83%)
Cough                (82%)
Shortness of breath  (31%)
Muscle ache          (11%)
Confusion            (9%)
Headache             (8%)
Sore throat          (5%)
Rhinorrhoea          (4%)
Chest pain           (2%)
Diarrhoea            (2%)
Nausea and vomiting  (1%)

In other words given a probability $S_i$ for each symptom and showing k number of symptoms, what is the likelihood $P$ that I'm infected? Using $P(A|B) = P(B|A) \cdot P(A) / P(B)$, I think the main formulation is:

$$ P(c19 | [fever, cough, \cdots]) = P(c19 | S_1) \cdot P(c19 | S_2) \cdots P(c19 | S_k) / X $$ Here $X$ combines all $P(B)$ for each symptom, but what is P(B) for each symptom? and do the probabilities simply multiply?

BruceET
  • 47,896
  • 2
  • 28
  • 76
3150
  • 109
  • 4
  • 4
    Probabilities take values in $(0,1).$ So the more of them you multiply together in your displayed equation the _smaller_ $P(C_{19})$ gets. This seems counter-intuitive. If someone exhibits all of the first four symptoms you list, it seems the probability of Covid-19 should _increase._ // In order to use Bayes Theorem you'd move from cond'l probabilities such as P(Fever|C-19) and P(Cough|C-19) to the reverse-cond'l P(C-19|List of symp). In order for that to work you need to know the population-wide prevalence P(C-19) and P(Fever|No C-19),....// _As stated,_ this does not seem a good approach. – BruceET Aug 03 '20 at 07:46
  • 4
    The given information cannot implied what you want to see. It only provide it does not provide any pairwise or "higher"wise association. – TrungDung Aug 03 '20 at 07:51
  • 1
    How are you getting the conditional probabilities $P(s19|S_1)$? Your starting information seems to be $P(S_1|c19)$... – user2705196 Aug 03 '20 at 16:18
  • @BruceET I'm not sure I follow but I'm trying to formulate a simple (maybe naive) question: say a patient shows up at the dr office and shows 3/11 of the symptoms, what are the chances that person has C-19? – 3150 Aug 03 '20 at 16:36
  • "Not sure I follow." is not helpful to the discussion. Specifically, what don't you follow?The point is you can't find P(C|Symptons) without a lot more information than you seem to have, With or without reference to your particular question, it's worth learning how to use Bayes' Theorem. – BruceET Aug 03 '20 at 16:42
  • @BruceET ahh I see, I meant this part: "In order for that to work you need to know the population-wide prevalence P(C-19) and P(Fever|No C-19)". yes I'm trying to learn Bayes' Theorem which led me to the original question but seems is not as clear. – 3150 Aug 03 '20 at 16:48
  • 1
    P(C-19|Fever) depends on the population of people who have a fever and you give no information about them. – BruceET Aug 03 '20 at 16:56

3 Answers3

6

You are asking about conditional independence: $S_1 \; {\rm INDEP} \; S_2 \mid C19$. The way you write the joint probability, as a product over the probabilities of each feature - that model assumes conditional independence.

You can check whether this assumption holds by comparing the joint distributions of pairs of input variables, for each of the possible outcomes of $C19$.

Match Maker EE
  • 1,701
  • 4
  • 15
  • 5
    (+1) A very important point. But note that even pairwise (conditional) independence doesn't imply mutual (conditional) independence: https://stats.stackexchange.com/q/180708/17230 – Scortchi - Reinstate Monica Aug 03 '20 at 14:33
  • @Match Maker EE I'm not sure there's an answer here :) care to rephrase? I realize my question maybe is not as clear, simply it's asking: what are the chances of infection given say 3/11 symptoms – 3150 Aug 03 '20 at 16:43
  • 1
    @3150: This is (another) reason why your question's unanswerable from the data supplied. Not uncommonly, it's a tell-tale *combination* of certain symptoms (& the absence of others) that lead to a diagnosis: the assumption of mutual conditional independence in this case would need a justification from medical science. – Scortchi - Reinstate Monica Aug 03 '20 at 18:37
  • 1
    A little context: in medical decision making often the assumption of conditional independence is made. Many 'add-up' simple scoring models are being used in daily practice by clinicians, with much success! That the scores add up is because the sum is in the log-space, whereas your probability terms are in the normal space. Recall that ln(a*b) = ln(a) + ln(b). As ln(x) is monotonic, a threshold probability in the probability space transforms directly to log-space. So physicians know when the symptoms clearly indicate disease (sum of log-scores > threshold), and when not. – Match Maker EE Aug 03 '20 at 20:20
4

Many clinics and medical offices scan for Temperature when you come in the door as a way to detect people who might have C-19 in order to take special precautions if necessary. I have no idea what the actual probabilities are, but here is how Bayes' Theorem would be used in order to find P(C-19 | Fever), denoted $P(C|F)$ below.

$$P(C|F) = \frac{P(CF)}{P(F)} = \frac{P(C)P(F|C)}{P(CF)+P(C^cF)}\\ =\frac{P(C)P(F|C)}{P(C)P(F|C)+P(C^c)P(F|C^c)}.$$

So in order to find $P(C|F),$ you need to know all of the probabilities in the last expression. Right now, where I live, just knowing the prevalence $P(C)$ seems difficult. And if $P(F|C^c)$ gets too large (that is, lots of people have fever for reasons unrelated to Covid-19), then the temperature scans as people come in the door become useless as a quick screen for Covid-19.

However, if you had information to be sure $P(C|F) > P(C),$ then you'd know temperature scans are useful, and maybe you can get an even larger probability of $C$ given a longer list of symptoms.

BruceET
  • 47,896
  • 2
  • 28
  • 76
2

Here's the way I would do it (check it's correct!)

If you have more than one symptom you apply the Bayesian theorem recursively using the previous result as the new prior probability. Here's an example (works in R) - I made up some starting values without too much thinking about them:

p_fever_C19 = 0.83   # P of fever given you have C19
p_fever_noC19 = 0.01 # P of fever given you DO NOT have C19
p_C19 = 0.0001       # P of C19 without any information about symptoms (i.e. P of C19 in the general population) 

If you have fever your probability of having C19 is:

p_C19_given_fever = (p_C19 * p_fever_C19) / 
                    ((p_C19 * p_fever_C19) + ((1-p_C19) * p_fever_noC19))

Let's say now you also have cough, instead of p_C19 use the previous p_C19_given_fever

p_cough_C19 = 0.82
p_cough_noC19 = 0.1

p_C19_given_fever_cough = (p_C19_given_fever * p_cough_C19) / 
                          ((p_C19_given_fever * p_cough_C19) + ((1-p_C19_given_fever) * p_cough_noC19))

You also have short breath:

p_short_C19 = 0.31
p_short_noC19 = 0.2

p_C19_given_fever_cough_short = (p_C19_given_fever_cough * p_short_C19) / 
                                ((p_C19_given_fever_cough * p_short_C19) + ((1-p_C19_given_fever_cough) * p_short_noC19))

.. and so on for other symptoms. For this example the results are:

p_C19_given_fever
0.008232494
p_C19_given_fever_cough
0.06372898
p_C19_given_fever_cough_short
0.09543484
dariober
  • 2,805
  • 11
  • 14
  • 1
    You're tacitly assuming e.g. P(cough | C19) = P(cough | C19 & fever) – Scortchi - Reinstate Monica Aug 03 '20 at 15:07
  • @dariober "If you have more than one symptom you apply the Bayesian theorem recursively". This bit sounds like the key part to combine multiple probabilities, is this rule always applicable? – 3150 Aug 03 '20 at 16:53