I understand how to use NB and have used it often. However, I am trying to understand how the two different ways I use to calculate the evidence (P(E)
) result in the same figure.
The simplest way I calculate P(E)
is the most straightforward:
number of times the evidence appears in the dataset / dataset size
However, I have also been taught and use another method:
P(E) = P(E|H)P(H) + P(E|¬H)P(¬H)
where ¬
means NOT
.
The different methods result in the same answer - I just struggle to see how and why it would be used when the other method is simpler? I think of the second method as normalising (that makes sense to me); I don't see how it equates to evidence, though.