while reading this post, I came across this claim:
"In practice, however, it’s better to model Σ ( X ) as log Σ ( X ) , as it is more numerically stable to take exponent compared to computing log."
but there is no explanation of why is this true. can enyone explain this?