4

As a follow up to one answer of the topic Expectation-Maximization with a coin toss: One of the user posted an R-code with MLE example almost a year ago (and his last online time here was 3 months ago, so I doubt he answers my question regarding his code).

In maximization step he multiplies likelihood of an A coin (row$weight_A) by a log-likelihood (llf_A). I do not understand why it is allowed to multiply a likelihood with a log-likelihood. I mean why wouldn't we multiple a likelihood with a likelihood or log-likelihood with a log-likelihood? And in the code he mixes it.

require("stats4");

## sample data from Do and Batzoglou
ds<-data.frame(heads=c(5,9,8,4,7),n=c(10,10,10,10,10),
    coin=c("B","A","A","B","A"),weight_A=1:5*0)

## "baby likelihood" for a single observation
llf <- function(heads, n, theta) {
  comb <- function(n, x) { #nCr function
    return(factorial(n) / (factorial(x) * factorial(n-x)))
  }
  if (theta<0 || theta >1) { # probabilities should be in [0,1]
    return(-Inf);
  }
  z<-comb(n,heads)* theta^heads * (1-theta)^(n-heads);
  return (log(z))
}

## the "E-M" likelihood function
em <- function(theta_A,theta_B) {
  # expectation step: given current parameters, what is the likelihood
  # an observation is the result of tossing coin A (vs coin B)?
  ds$weight_A <<- by(ds, 1:nrow(ds), function(row) {
        llf_A <- llf(row$heads,row$n, theta_A);
        llf_B <- llf(row$heads,row$n, theta_B);

    return(exp(llf_A)/(exp(llf_A)+exp(llf_B)));
  })

  # maximisation step: given params and weights, calculate likelihood of the sample
  return(- sum(by(ds, 1:nrow(ds), function(row) {
    llf_A <- llf(row$heads,row$n, theta_A);
    llf_B <- llf(row$heads,row$n, theta_B);

    return(row$weight_A*llf_A + (1-row$weight_A)*llf_B);
  })))
}

est<-mle(em,start = list(theta_A=0.6,theta_B=0.5), nobs=NROW(ds))
Alina
  • 915
  • 2
  • 10
  • 21
  • If we will exponentiate the last line, we will get likelihood(A) to the power `row$weight_A` multiply by likelihood(B) to the power `1 - row$weight_A`. The final function returns actually log likelihood of the data. It was done in order not to multiply super small numbers (but I wonder why it was not done for the return of llf function itself, it is much safer to do z calculation on log scale) – German Demidov Sep 21 '16 at 15:15
  • `row$weight_A` is a likelihood of the data being produced by coin A, so it is exactly likelihood(A). `llf_A` is the log probability of tossing ones/zeros under a theta_A . If we exponentiate, we will have probability(theta_A) to the power likelihood(A) multiply by ... . But I still do not get why one would multiply likelihood(A) by a log probability (different scales of variables). ( logarithmizing avoids underflow but I asked why one mixes the log probabilities with non-log probabilities while multiplying) And why the whole term `row$weight_A*llf_A` is in log? – Alina Sep 21 '16 at 17:27
  • It looks like `row$weight_A` is a weight of the cluster and written as `exp(llf_A)/(exp(llf_A)+exp(llf_B))`. Which gives the likelihood exactly as it is defined as $\prod p^x (1-p)^{(1-x)}$. May be I am wrong, but it looks exactly like this. – German Demidov Sep 21 '16 at 19:02
  • Now I know where I had a mistake, I was always thinking about $\prod (p^{heads})(1-p)^{tails}$ and therefore could not relate `row$weight_A` to the likelihood estimation. – Alina Sep 22 '16 at 07:41
  • Since your question is closed here, can you perhaps change/remove your comment on the original question http://stats.stackexchange.com/a/131479/65037 – user3096626 Nov 15 '16 at 08:56

0 Answers0