4

What are the advantages of using a log linear representation rather than a table representation? Is it simply computational issue (avoid overflowing)?

For example, in a markov network A-B we can represent the factor P(A,B) as a table:

A B P(A,B)
0 0 10
0 1 1
1 0 1
1 1 10

Alternatively, if we represent the factor P(A,B) as log linear model:

$$P(A,B) = \exp\bigg(\sum\limits_{i=1}^4\theta_i f_i(D_i)\bigg)$$

Here $f$ is an indicator function. Then basically each $\theta$ is the log of the entry in the table representation. What would be the advantages of the log linear representation here?

gung - Reinstate Monica
  • 132,789
  • 81
  • 357
  • 650
Dzung Nguyen
  • 223
  • 1
  • 11
  • 3
    'Log linear models' is a term used to represent several different things. Can you give some clarifying context? Also, what do you mean by 'table representation'? – Glen_b Aug 11 '14 at 19:43
  • 2
    Your probabilities seem to go up to 10. – conjugateprior Mar 13 '15 at 18:25
  • In markov networks, P(A,B) are unnormalized probabilities. They will be normalized later: http://en.wikipedia.org/wiki/Markov_random_field – Dzung Nguyen Mar 13 '15 at 18:33
  • I took the liberty of editing your post to make the English clearer. Please ensure it still says what you want it to. – gung - Reinstate Monica Mar 13 '15 at 18:33
  • 1
    @conjugateprior only 10? [Mine goes up to 11](http://youtu.be/KOO5S4vxi0o)... – shadowtalker Mar 14 '15 at 11:15
  • In Markovian networks the Markovian properties are features of the graph structure linking a set of random variables not the scaling of the probability distribution over those variables. The link you included in your comment explains this quite clearly. – conjugateprior Mar 14 '15 at 15:00

2 Answers2

1

There is a different literature supporting the use of log-linear models that begins with Bishop, et al., Discrete Multivariate Analysis in 1975. Extends through Leo Goodman's RC models beginning in the 80s, Agresti's Categorical Data Analysis, books by Stephen Feinberg and includes Wickens excellent book Multiway Contingency Tables Analysis for the Social Sciences, 1989. Needless to say, these approaches are all appropriate for frequency, "count" or classificatory data.

The example given above is for a simple, 2x2 table. It may be the case that there are few advantages using log-linear models for this case since a sophisticated analysis isn't needed. One big advantage of the log-linear framework is the flexibility it offers in testing different table structures in higher dimensions than 2X2 that distinguish, e.g., independence on the diagonal (the classic chi-square test) from conditional independence in a table as a function of how you slice that table up. In addition and beyond the chi-squares, odds-ratios are readily estimable as more suitable metrics of effect size.

Clearly, there is more than one way to analyze frequency data. How one chooses to do it is a function of one's training and comfort level.

Mike Hunter
  • 9,682
  • 2
  • 20
  • 43
0

Most textbooks and slides I found just state that it's "common" or "convenient" to do so but don't explain why.

I've found two reasons that apply to Markov Networks:

  1. Exponentiating the weighted features makes sure that they are all larger than zero. Normalizing with the partition function Z makes sure that they all sum up to one. This way we get a valid probability distribution. This advantage is explained in this Coursera course (you have to register first).

  2. We take advantage of the fact that the exponential function is its own derivation. This makes it much easier to compute the derivative when learning the weights, e.g. with gradient descent.

From a numerical point of view, it is preferable to sum the log of probabilities because multiplying many small probabilities may risk an underflow of the computer's numerical precision.

Suzana
  • 153
  • 1
  • 6