Computation of log-likelihood in semi-supervised naive bayes

Question

I have the following 2 questions about log-likelihood computation in semi-supervised Naive Bayes.

I have read on several documents online that, in every EM iteration of the semi-supervised Naive Bayes, log-likelihood is positive. Is this always true? In my text classification problem I am getting the following log-likelihoods:

                 previous loglh     current loglh         diff 
M: #iteration 2  -36268.3096003 -> -89209.1178494 (-52940.8082491   )
M: #iteration 3  -89209.1178494 -> -34633.3568107 ( 54575.7610387   )
M: #iteration 4  -34633.3568107 -> -38624.6148215 ( -3991.25801086  )
M: #iteration 5  -38624.6148215 -> -32929.3134083 (  5695.30141321  )
M: #iteration 6  -32929.3134083 -> -36901.1324845 ( -3971.81907618  )
M: #iteration 7  -36901.1324845 -> -33105.8190786 (  3795.31340593  )
M: #iteration 8  -33105.8190786 -> -35887.8113077 ( -2781.99222912  )
M: #iteration 9  -35887.8113077 -> -33249.0299832 (  2638.78132451  )
M: #iteration 10 -33249.0299832 -> -35094.6821847 ( -1845.65220157  )
M: #iteration 11 -35094.6821847 -> -33459.5111152 (  1635.17106958  )
M: #iteration 12 -33459.5111152 -> -34587.8807293 ( -1128.36961412  )
M: #iteration 13 -34587.8807293 -> -33661.1108938 (   926.769835475 )
M: #iteration 14 -33661.1108938 -> -34252.017022  (  -590.906128148 )
M: #iteration 15 -34252.017022  -> -33804.2917848 (   447.72523711  )
M: #iteration 16 -33804.2917848 -> -34025.8914036 (  -221.599618742 )
M: #iteration 17 -34025.8914036 -> -33851.2573206 (   174.634083003 )
M: #iteration 18 -33851.2573206 -> -33911.2395915 (   -59.9822709405)
M: #iteration 19 -33911.2395915 -> -33871.2589912 (    39.980600331 )
M: #iteration 20 -33871.2589912 -> -33843.8767245 (    27.3822666886)

As you can see, there are some iterations in which it improves, and others in which it degrades. This is happening alternatively, which I find really strange...

If $L(U)$ is the number of labeled (unlabeled) docs, $C$ the number of classes and $\text{class}_{d_{i}}$ is the class of the labeled document $i$, I compute the log-likelihood as the sum of the 2 following likelihoods below. Is this computation correct? $$ \begin{aligned} \text{loglik}(h_{labeled})&= \sum_{i=1}^{L} \log( \text{prob}(\text{class}_{d_{i}}) * \text{prob}(d_{i}| \text{class}_{d_{i}})) \\ \text{loglik}(h_{unlabeled})&= \sum_{i=1}^{U} \sum_{j=1}^{C}\log( \text{prob}(\text{class}_{j}) * \text{prob}(d_{i}| \text{class}_{j})) \end{aligned} $$

Welcome to the site, @SUP. I edited your question a little to make it a little cleaner given the markup possibilities supported by CV. Please make sure that it still says what you want it to say. — gung - Reinstate Monica, Oct 16 '12 at 16:42
this is an old post but I have run into a similar problem, i have found it related to unbalanced data sets. When my class attribute is balanced the log likelihood acts as intended but when it is not the likelihood bounces around. Did you find a resolution because I can't seem to find an explanation beyond the class distribution. With respect to the likelihood equations I believe only the cluster the unlabelled exemplar is assigned to is used for the calculation. Based on the z data structure from the original paper. — Blizzah, Dec 12 '14 at 04:00

Computation of log-likelihood in semi-supervised naive bayes

0 Answers0