I am trying to compute a Bayesian posterior distribution, given a large number of data points. I’ve found that as the number of data points increases, the joint likelihood of the data underflows to zero.
The common solution to this problem is to take the logarithm, so that the product of likelihoods becomes the sum of log likelihoods; in theory, this transformation should avoid underflow. However, with my large number of data points, I find that the log likelihood approach still fails. Instead of underflowing to zero, the sum of many large negative values underflows to negative infinity.
In addition to using the log likelihood, I have also looked into the log-sum-exp trick, which is frequently used to avoid underflow in the integration for Bayesian evidence. However, in my case, I do not need to calculate the evidence term. I have also considered the possibility that my model was mis-specified, but loosening my prior distributions still does not overcome the downward pressure of the large data set, and the form of my likelihood function is well motivated for this data set.
Is there something that I am missing here? How to probabilistic programming libraries, like PyMC, avoid underflow when calculating the likelihood over a large number of data point? I’ve linked to some related pages below, but none specifically addressed the underflow of both the likelihood and the log likelihood in the face of a large dataset.
[1] Example of how the log-sum-exp trick works in naive bayes
[2] Why we consider log likelihood instead of likelihood in gaussian distribution