5

I want to compare two profile likelihood curves and determine if they are significantly different from one another.

For example are the following curves significantly different from one another:

two profile likelihood curves

I realize I can find a 95% confidence interval for a value from a profile likelihood curve using likelihood ratio testing like so. With this method I can determine if a single value is outside of the 95% confidence range of one profile likelihood curve. But I want to compare two curves.

Ben Haley
  • 301
  • 2
  • 7

1 Answers1

2

So I tried to solve this myself.

The general strategy

  1. Convert the profile likelihood curves into probability distribution functions by exploiting the likelihood ratio test as described here.
  2. Sample points from within these probability distributions and determine if one is consistently higher or lower.

The functions to do this in R are below. I'd be happy to clarify things if someone runs across this question and prods me a bit.

Generate a cdf from a likelihood profile

# Generate cdf
# Given a set of likelihood return the corresponding cumulative 
# density function
cdf <- function(likelihoods) {
  cdf <- (1 - pchisq(log(max(likelihoods)) - log(likelihoods), 1)) / 2

  # After 0.5 (the most likely) cdf values should count up to
  # 1 instead of counting back down to 0
  greater_than_most_likely <- which.max(cdf):length(cdf)
  cdf[greater_than_most_likely] <- 1 - cdf[greater_than_most_likely]

  cdf
}

For the example I showed, this results in cdf curves as follows.

enter image description here

Determine the probability that two cdfs are different

# p greater
# Given two cumulative probability density curves estimate the p value
# associated with the hypothesis
# a > b
#
# Tests
# p_greater(c(0.000, 0.001), c(0.999, 1)) 
# # ~ 0
# p_greater(pnorm((-100:100)/100), pnorm((-100:100)/100)) 
# # ~ 0.50
# p_greater(pnorm((-100:100)/100, sd=2), pnorm((-100:100)/100, sd=1)) 
# # ~ 0.50
# p_greater(pnorm((-1000:1000)/100, mean=1), pnorm((-1000:1000)/100, mean=0)) 
# # ~ 1 - sum(rnorm(1000000, mean=1) >= rnorm(1000000, mean=0)) / 1000000
# # ~ 0.24
p_greater <- function(cdf_a, cdf_b, n_samples=10000) {
  a_sample <- which.closest(runif(n_samples), cdf_a)
  b_sample <- which.closest(runif(n_samples), cdf_b)
  a_greater <- a_sample > b_sample
  a_equal <- a_sample == b_sample
  1 - (sum(a_greater) + sum(a_equal)/2) / n_samples
}

# p different
# Like p greater but a two sided test
p_different <- function(cdf_a, cdf_b, n_samples=10000) {
  p_a_greater <- p_greater(cdf_a, cdf_b, n_samples)
  p_b_greater <- p_greater(cdf_b, cdf_a, n_samples)  
  min(p_a_greater, p_b_greater) / 2
}

so that I can call:

p_different(cdf(likelihoods_1), cdf(likelihoods_2))

In this case I got a p-value of ~ 0.16.

Ben Haley
  • 301
  • 2
  • 7
  • 4
    Your solution does not appear to address the substantive question, nor does it appear to have any theoretical support. As far as I can tell, you are evaluating the chance that a random variable drawn from the distribution corresponding to the peak of one profile curve would exceed a random variable drawn from the distribution corresponding to the peak of another profile curve. What justifies using that as a p-value? It's not one. – whuber Oct 22 '14 at 18:44
  • I'm not trying to draw from the peak of the likelihood profile, but anywhere along the likelihood profile with probability proportional to the how probable that value is given the likelihood. This is why I convert the the profile likelihood into a cdf for that number. I guess that's not clear. If you tell me what made you think I was only looking at peaks I can try to clarify. – Ben Haley Oct 22 '14 at 19:36
  • 2
    A likelihood does *not* describe a distribution of its parameters--not by any means: http://stats.stackexchange.com/questions/2641/what-is-the-difference-between-likelihood-and-probability. – whuber Oct 22 '14 at 19:38
  • Yeah that is true generally. But is is possible to derive a confidence boundary for profile likelihoods. For example [1]. And hence I propose we can derive the full distribution function. Are you talking about profile likelihoods in particular or likelihoods in general? [1]: http://stats.stackexchange.com/a/78361/7494 – Ben Haley Oct 22 '14 at 19:53
  • I added some elaboration to the answer. Hopefully it clarifies things a bit. Thank you for critiquing it. – Ben Haley Oct 22 '14 at 20:03
  • 2
    I am still baffled by why treating a profile likelihood as if it were a CDF (of what random variable?) ought to produce a quantity that can be interpreted as a p-value. I'm not saying it *isn't*, but only that the connection hasn't been made. A p-value would reflect how the profile likelihoods would vary under the null hypothesis that both were generated by the same underlying distribution (and therefore differed only by chance). It would be based on a statistic that quantifies the discrepancies between two such curves and involves a Neyman-Pearson critical region. – whuber Oct 22 '14 at 20:48
  • Let us [continue this discussion in chat](http://chat.stackexchange.com/rooms/18088/discussion-between-ben-haley-and-whuber). – Ben Haley Oct 22 '14 at 20:59