Expectation maximization on Bayesian networks with latent variables

Question

I am trying to determine parameters in a bayesian network with two latent variables (in blue).

DAG (not enough reputation to post an image)

Every variable is discrete with 2-4 categories. The latent variables have 3 categories each.

I am trying to find an R package that will let me define the DAG and then learn the conditional probability parameters from data:

    B C D E F G H I J
    2 1 ? 3 2 ? 1 4 4
    2 2 ? 3 2 ? 1 3 2
    2 1 ? 3 2 ? 1 4 2
    3 3 ? 3 2 ? 1 4 2
    3 1 ? 3 2 ? 1 1 4
    3 3 ? 3 2 ? 1 4 1
    ...

I have tried working with:

bnlearn: as far as I can tell the parameter estimation doesn't support latent variables
catnet: I can't really tell what is going on here, but the results haven't worked out the way I expected them to when using the parameter estimation tools in this package.
gRain: I tried working with this a lot, but couldn't sort out how to generate the CPT values when one or more of the variables in my DAG was latent.

I also tried working out how to use rJags for a long time but ended up giving up. The tutorials I found for parameter estimation all assumed either:

There are no latent variables
We were trying to do learning on the structure, rather than just determine conditional probabilities given a structure.

I'd really like it if I could find a solid R package for tackling this problem, though a java/scala solution would work as well. A tutorial/example problem would be even better.

I have a vague impression that expectation maximization techniques can be used and even started trying to write code for the problem myself by extending the cluster graph stuff written here bayes-scala but quickly realized that this was a bigger project than I initially thought.

Bayes-Scala
I worked with Bayes scala some more and found a video from a coursera class defining cluster graphs http //www.youtube.com/watch?v=xwD_B31sElc

I'll walk through here quickly what I think I learned. First off, the full DAG I am dealing with is (http //i.imgur.com/tnVPFGM.png) (still not enough reputation for images or more links)

Most of the language dealing with cluster graphs refers to factors in factor graphs, so I tried to translate this DAG into a factor graph (http //i.imgur.com/V54XdTV.png). I'm not certain how this works, but I think this factor graph is a valid transformation from the DAG.

Then a simple cluster graph I could make is called a bethe cluster graph (http //i.imgur.com/hHb7eQ7.png). The bethe cluster graph is guaranteed to satisfy the constraints of cluster graphs. Namely:

1. Family Preservation: For each factor $\Phi_k$ there must exist a cluster $C_i$ s.t $scope(\Phi_k) \subseteq C_i$

2. Running Intersection Property: for each pair of clusters $C_i,C_j$ and variable $X_i \in C_i \cap C_j$ there exists a unique path between $C_i,C_j$ such that all clusters along the path contain $X_i$

However, I think the bethe cluster graph loses the covariability conferred by the original DAG. So I tried to construct what I believe is a valid cluster graph that still maintains information about covariance between variables:
http //i.imgur.com/54gDJhl.png

I think its possible that I messed something up in this cluster graph, but I could use bayes scala to write this cluster graph (http //pastebin.com/82ErCSQT). After I do so the log likelihood's progress as one would imagine:

EM progress(iterNum, logLikelihood): 1, -2163.880428036244
EM progress(iterNum, logLikelihood): 2, -1817.7287344711604

and the marginal probabilities can be calculated. Unfortunately the marginal probabilities for the hidden nodes turn out to be flat after using GenericEMLearn:

marginal D: WrappedArray(0.25, 0.25, 0.25000000000000006, 0.25000000000000006)
marginal J: WrappedArray(0.25000000000000006, 0.25, 0.25, 0.25)

Additionally an odd error occurs where if I set evidence in a LoopyBP more than once I start getting NA values back:


val loopyBP = LoopyBP(skinGraph)
    loopyBP.calibrate()
    (1 to 44).foreach((sidx) => {
      val s1 = dataSet.samples(sidx)
      (1 to 10).foreach((x) => {      
        loopyBP.setEvidence(x,s1(x))
      })
      loopyBP.marginal(11).getValues.foreach((x) => print(x + " "))
      println()
    })

actual value was 0
0.0 0.0 0.0 1.0 
actual value was 0
NaN NaN NaN NaN

This problem actually occurs for the sprinkler example as well, so I must be using setEvidence incorrectly somehow.

score 3 · Answer 1 · edited Nov 05 '13 at 11:05

3

I'm the creator of bayes-scala toolbox you are referring to. Last year I implemented the EM in discrete bayesian network for learning from incomplete data (including latent variables), that looks like the use case you are asking about.

Some tutorial for a sprinkler bayesian network is here.

And, "Learning Dynamic Bayesian Networks with latent variables" is given here.

edited Nov 05 '13 at 11:05

Zhubarb

7,753
2
28
44

answered Jun 07 '13 at 11:13

Daniel Korzekwa

61
1

When I tried to translate my DAG into a cluster graph I first got a sepset not equal to 1 error because I handled the D,E,F region of the DAG incorrectly. After fixing that problem when I run GenericEMLearn I end up getting positive log likelihoods, and NA values for marginals. I do not fully understanding how to make a cluster graph out of a factor graph or DAG. I looked around for a resource on this and could not find one, do you have a suggestion for how to make a cluster graph from a DAG? The sprinkler example you gave doesn't fully define that transformation. – Thomas Luechtefeld Jun 07 '13 at 12:54
I will have a look during the weekend and try to learn your network with bayes-scala – Daniel Korzekwa Jun 07 '13 at 12:56

score 2 · Answer 2 · answered Jun 08 '13 at 07:36

Regarding to "...Additionally an odd error occurs where if I set evidence in a LoopyBP more than once I start getting NA ..."

I think it's because you set the evidence in the same Bayesian Network iterating over all samples in a training set and then you end up with some factors having 0 probabilities for all factor values. How setEvidence() works for discrete factors is that it sets the evidence probability to 0 for all factor values incompatible with evidence. I will throw an error if setEvidence() is set for all factor values.

Why would you want to set the evidence in a single bayesian network over all samples?

Regarding to flat probabilities for hidden nodes, remember that EM does not guarantee to converge to a global maximum, it's quite important how you set the priors in the network. Please send me your code including training set and I will check it further.

On factor graph: Bayes-scala also supports inference on a factor graphs but only for continuous and hybrid bayesian networks using Expectation Propagation algorithm, the two cases I tested include kalman robot localisation and TrueSkill rating model.

And the last thing on sepsets including more than one variable. It's the current limitation of bayes-scala to allow for a single sepset variable.

The data is: http://pastebin.com/wktYyefF The DAG is: http://i.imgur.com/tnVPFGM.png I was iterating over samples to test if the network could be used to predict variables for which I did not set the evidence (I'd like to use this network to predict the H variable given values for all the other non-latent variables). — Thomas Luechtefeld, Jun 08 '13 at 16:16
I wrote a method for doing predictions using a LoopyBP object [here](http://pastebin.com/e0Bq9fNR). I think ideally we don't have to copy the LoopyBP clustergraph, and instead just modify the message sending given evidence when calculating a marginal, but I haven't diven into the code enough to implement that yet. This function resolve the NA problem. — Thomas Luechtefeld, Jun 08 '13 at 17:23

Expectation maximization on Bayesian networks with latent variables

2 Answers2

Linked