How to normalize log-likelihoods?

Question

I have referred to this post here. In my case, I am trying to normalize the log-likelihoods I obtained from the same data set. However, the approach in the top answer did not work out for me. Below are n = 100 of my log-likelihoods.

ll <-
c(-35480.7118796417, -2739.42871153509, -8362.32845821982, -8365.34260282213, 
-9094.89673916007, -21270.2045379339, -12580.518821136, -13783.8037118782, 
-42712.3278071616, -423837.36571571, -1805.78632985328, -2002.85063426255, 
-3762.76492409086, -6849.63087008593, -11956.7406803718, -20976.5170841893, 
-135949.201113163, -31498.4171087018, -15446.7854298957, -5120.27102265426, 
-27618.8855708366, -1181406.765207, -80867.7262649035, -816098.492721317, 
-10945.0893400045, -1628417.58006139, -16482.9909028172, -65197.3708076204, 
-67871.4208617162, -77627.4857483449, -17479.5870536702, -93579.8562141009, 
-5281.94699357017, -5554.87325489942, -15628.8543639877, -10632.6082887688, 
-6489.97791129332, -11988.4625382098, -9170.39131516501, -9339.36646353115, 
-113156.200514024, -2567.90247055723, -14467.0183704268, -48650.137758317, 
-10495.0548182032, -2889.13562818923, -115485.747589673, -80143.5433253561, 
-5428.78386244051, -15449.3451561719, -25609.2870701355, -3832.78432518107, 
-24152.3367051411, -2272.05842004831, -3184.3778786079, -2727.35587110321, 
-2416.5798834696, -31118.8156319793, -51467.1989404669, -1847.91304055039, 
-75179.9905124732, -55387.6390729858, -6217.46116602578, -80167.2789661716, 
-2262.83239548126, -7967.86416890192, -203981.847720651, -5348.68097367683, 
-35539.3623924968, -29740.7965097703, -145107.587483733, -16038.5130132878, 
-30534.4376362393, -164907.15448187, -18269.9830224852, -10168.7453829144, 
-86431.7638673747, -30391.9503748769, -2836.02770007675, -5544.75647729347, 
-4392.30776454182, -215937.960800752, -28141.7740166936, -7439.99455094574, 
-13708.5567616969, -9908.51651663084, -46267.2625988753, -5158.62658552135, 
-5521.97889928275, -1485.65777356592, -464662.123874468, -30866.4651125434, 
-14037.8604686386, -4754.14972171724, -13552.1743710923, -4674.74137872854, 
-6890.68419568385, -12060.9906749138, -5763.51938836069, -3472.01636104686
)

#Subtract everything by maximum of the log likelihoods
ll2 <- ll - max(ll)
#Throw away anything that is < log(10e-16/n)
ll3 <- subset(ll2, ll2 > log(10e-16/100))
> exp(ll3) 
> 1

When I'm calculating ll3, I threw away my entire vector of log-likelihoods, which is problematic. Even when I was subtracting ll by its maximum, the resulting ll2 still contains values that are extremely small. So if I were to exponentiate them, I would get 0.

Why do you need to exponentiate them? Maybe that’s where the answer is. — hejseb, Jan 15 '18 at 07:04
As explained in Gordon Smyth's answer, your largest likelihood is $10^{139}$ times higher than any of the others -- is there any practical difference to $\infty$ times higher? If you think yes, you may need to explain the intended use-case — Juho Kokkala, Jan 15 '18 at 08:05

Gordon Smyth · Accepted Answer · 2018-01-15T18:12:22.730

You haven't explained why you want to convert log-likelihoods into unlogged probabilities that add to one. Contexts in which you would need to convert log-likelihoods into weights like this are rare.

Anyway, you can get the probabilities simply by:

> p <- exp(ll - max(ll))
> p <- p/sum(sort(p))

It is impossible to do any better than this if you want unlogged numbers.

From the log-likelihoods you give in your post, it is obvious that the largest probability will be exactly 1 and most of the rest will be exactly 0. The largest log-likelihood that you give is -1485.658. The next largest is -1805.786. Now $$\exp(1805.786-1485.658) = 10^{139}$$ so your largest (unlogged) likelihood is bigger than the second largest by 139 orders of magnitude! This shows that the second largest normalized probability will be about $10^{-139}$.

By the same reasoning, the 3rd largest $p$ will be $10^{-158}$, the 4th largest will be $10^{-225}$ and all the remainder will be zero because numbers below $10^{-300}$ are zero in double-precision arithmetic.

On the other hand, you could compute non-zero probabilities for all the log-likelihoods if you were willing to represent them on the natural log-scale. For this data, the log-probabilities are simply

> logp <- ll - max(ll)

correct to machine accuracy. If the largest log-likelihood wasn't so dominant, then you would need to add an extra step:

> sump <- sum(sort(exp(logp))
> logp <- logp - log(sump)

From a statistical inference point of view, it all seems very simple. One log-likelihood is very much larger than all the others. What could be simpler?

Assuming that the statistical models from which the log-likelihoods have been computed are comparable in some sense, then the parameters values that lead to the largest log-likelihood are the only ones that can be entertained. All the other parameter combinations lead to log-likelihoods that are so small they can be ruled out (unless the prior probabilities are truly dramatic, and you don't seem to have prior probabilities anyway).

I see. So in this case there is no way to normalize this set of log-likelihoods? — Adrian, Jan 15 '18 at 07:52
Re the first paragraph: does the question state these are from different datasets ? — Juho Kokkala, Jan 15 '18 at 08:02
No, these values are what I obtain after plugging in different parameter values into a function that returns the log likelihood. — Adrian, Jan 15 '18 at 08:03
@GordonSmyth The previous post is from 2013 and by a different OP (this issue may arise, say, in computing a conditional distribution for Gibbs sampling over a finite set, or for normalizing particle filter weights) — Juho Kokkala, Jan 15 '18 at 08:07
I'm computing different log-likelihoods for the same dataset. Sorry for the confusion. — Adrian, Jan 15 '18 at 08:10
@JuhoKokkala Thanks, I overlooked it was a different OP. I've edited my answer. — Gordon Smyth, Jan 15 '18 at 08:20
For the record, scientific numerical libraries typically have a "logsumexp" function (or similarly named) for calculating the `sump` value. eg: in [pytorch](https://pytorch.org/docs/stable/generated/torch.logsumexp.html) and [scipy](https://docs.scipy.org/doc/scipy/reference/generated/scipy.special.logsumexp.html). — drevicko, Nov 16 '20 at 01:26
@GordonSmyth why do you say that needing to convert log likelihoods to probability is rare? Isn't this done when normalizing log likelihoods calculated from bayes' theorem? — Eric Roller, Dec 16 '20 at 17:55

How to normalize log-likelihoods?

1 Answers1