5

I am running an odds ratio calculation for site methylation amongst cases and controls.

In this situation is it preferable to use a conditional or unconditional MLE? I am asking because R uses a conditional estimator while scipy uses an unconditional estimator. See here.

As a result, I am getting differences in the p-values calculated. I found an article from 1984 which suggests that conditional MLE is far superior. If this is the case, why does scipy API suggest that using unconditional MLE is much more common?

It was also asked here by gotgenes. But no answer was provided so far.

odds_extractor <- function(cpg_id,disease,control){
        # Make a temporary data frame to hold just one cpg site frequency table
        temp <- cpg_list[[cpg_id]][c(disease, control),]
        # Get all of the a,b,c,d values
        a <- as.numeric(temp[1,1])
        b <- as.numeric(temp[1,2])
        c <- as.numeric(temp[2,1])
        d <- as.numeric(temp[2,2])
        # Perform the odds ratio calculation
        oddsratio(a+rts,b+rts,c+rts,d+rts)

}
...
writting[i,4] <- odds_extractor(names(cpg_list)[i],disease,control)$estimate
bli
  • 55
  • 7
quantik
  • 151
  • 4
  • 3
    Given [the commentary around when this was implemented](https://github.com/scipy/scipy/issues/1483), I suspect the scipy developers haven't a clue what the difference is. –  Jul 03 '17 at 21:02
  • @DevonRyan That's actually quite useful. I did not stumble upon that. And yes it might be good to move it to cross-validated - though considering odds ratios are a popular metric amongst biostatisticians it's probably useful here too. – quantik Jul 03 '17 at 21:05
  • @Llopis I am rewriting some R scripts to python mostly for debugging purposes and I noticed the difference in the p-values calculated. It was a significant enough difference to be noted – quantik Jul 03 '17 at 21:06
  • "I spoke with a graduate student with regards to Fisher's Exact Test's odds ratio calculation. The calculation implemented in this proposed addition to SciPy is the one he was familiar with. I guess this is what R is calling the "unconditional Maximum Likelihood Estimate". R uses something called the "conditional Maximum Likelihood Estimate" for this ratio. I have absolutely no idea how R calculates this, or even where this calculation derives from, and neither did the graduate student. As shown by Josef, the results are very different between the two." Same question - left unanswered. – quantik Jul 03 '17 at 21:14
  • The direct issue of why the help says something is probably as answered as it will ever be; the basic issue of what the difference is and why you'd prefer one or the other seems like a good question. What's the R function you're calling? (in case someone wants to see the help or the code you're talking about) – Glen_b Jul 03 '17 at 21:26
  • @Glen_b the R code uses the package fmsb. Specifically the estimate call uses oddsratio$estimate from this package. – quantik Jul 03 '17 at 21:35
  • 2
    Your comment is slightly unclear (since `oddsratio$estimate` would be a piece of an object, not a function call). You mean you call the function `oddsratio` and look at the returned value of `estimate`? Out of curiosity, what was the problem with `fisher.test` in vanilla R, which produces an odds ratio and computes a p-value for Fisher's exact test? (Fisher's exact text conditions on the marginal totals, which I presume is the conditioning under discussion here) – Glen_b Jul 03 '17 at 22:48
  • @Gleb_b I was not the author of the script - I will ask the author of it why they did so. Also I updated my post with the relevant section of the Rscript. Even if `fisher.test` was used, I think I'd still encounter the same problem. – quantik Jul 03 '17 at 23:54
  • 2
    Probably related: https://stats.stackexchange.com/q/54530/82584 – bli Jul 04 '17 at 07:15
  • The conditional odds ratio has the advantage that exact calculations (i.e. derivation of exact if perhaps conservative CIs and p-values, as well as median unbiased estimates) become easier and there are pretty efficient algorithms for that. This is quite interesting, if you can end up with y1/n1 compared to y0/n0 with e.g. y0=0 (or near zero). On the other hand, e.g. Firth's penalized likelihood version of the unconditional estimate is pretty good, too. One should probably never use the simple unconditional estimate for small numbers. No idea exactly what these packages really use though. – Björn Jul 04 '17 at 07:43
  • @Björn Yeah dealing with small numbers here. I am not sure why the scipy implementation did not give the option for calculating a conditional estimator, and I can't find a library that does. So I will probably have to do the calculation manually. – quantik Jul 04 '17 at 13:21
  • Might use a Mantel-Haenzel estimator instead. I figure that is more consistent with the conditional estimator at low sample sizes as opposed to the vanilla unconditional estimator? – quantik Jul 04 '17 at 13:31
  • @Glen_b why was this marked as a duplicate? If it is indeed a duplicate it'd be good for you to link exactly where this was asked before. Given the one linked is not asking the same thing AND it never received a proper answer – quantik Jul 05 '17 at 15:00
  • @quantik The main criterion on whether they're duplicates is whether the same answers would answer both questions; I agree that the questions themselves are different butit looks to me like Jon and AdamO's answers do between them deal fairly directly with what I see as your question (though it's unfortunate that Jon has not expanded on his answer, because an expanded answer there would likely be very helpful). If you feel that those two answers don't answer your question, could you try to edit to focus your question on what is not addressed by them? I will reopen. – Glen_b Jul 05 '17 at 21:36
  • 2
    Was the 1984 article you mention the one by Hauck? – Glen_b Jul 05 '17 at 21:42
  • @Glen_b yes that's the one published in Biometrics I believe – quantik Jul 06 '17 at 18:26
  • I still can't clearly see how those other answers don't respond to your question - please do as requested above. – Glen_b Jul 07 '17 at 01:59

0 Answers0