1

I am currently doing a multiclass classification task on sequence data and am using tf.contrib.crf.crf_log_likelihood to compute sentence level log-likelihood values.

In particular it implements a linear chain CRF, where the likelihood values are calculated by summing over the unary and binary scores and normalised by subtracting the log sum exponentials over all alpha values from the forward pass.

As far as I understood correctly the output of the above function is the same as the formula (13) described in Natural Language Processing (Almost) from Scratch by Collobert et al. 2011

Furthermore in my system, the average negative log-likelihood (over each batch) is minimised as the training objective and the variables are updated based on tf.train.AdamOptimizer.

After training for roughly 2 epochs the maximum log-likelihood value (over the batch) starts to become positive.

I am wondering how this could happen? Would this not entail a probability over 1?

hatero
  • 93
  • 8
  • Since a likelihood is typically constructed from probabilities *and probability densities*, it sounds like you might need to read https://stats.stackexchange.com/questions/4220/. The context suggests it's possible you are truly dealing only with probabilities, in which case you should verify your software really is computing the likelihood: for computational efficiency, many programs do not compute constants that would not vary with the parameters. Thus what they produce is the log likelihood *up to some additive constant.* – whuber May 16 '17 at 18:48
  • I have looked through the code it I could not find any indication of a constant additive. – hatero May 16 '17 at 19:01
  • The maximum log-likelihood per batch is now fluctuating between awfully suspicious values, e.g. 0.0, 0.125, 1.5, 2.5, 4, 0.625, 0.03125, ... (not in that specific order). Maybe the cause of my problem is of numeric nature? – hatero May 16 '17 at 21:04

0 Answers0