A regression model for ratio of two Binomial success probabilities

Question

There are two series of observations of two Binomial trials. Observation $i$ of series 1 includes $n_{1i}$, the number of Bernoulli draws and $\overline{p}_{1i}$, the ratio between number of success and $n_{1i}$, i.e. the estimated success probability. Same applies to series 2. I believe that ${p}_{1i}$ can take on different values across $i$, but ${p}_{1i}$ and ${p}_{2i}$ are correlated, and have the following assumption: $$p_{1i} = \alpha p_{2i},$$ where $\alpha$ applies across all $i$. I want to estimate $\alpha$. However, I do not know how to formulate a model to do that.

In particular, I cannot figure out:

Suppose I have enough Bernoulli draws for each observation, then my $\overline{p}_{1i}$ is asymptotically normally distributed around ${p}_{1i}$. But if I believe ${p}_{1i}$ can take on different values across $i$, is the model above indeed estimable?
I believe that $n_{1i}$ should have an impact on the weights of observations in estimating $\alpha$. Don't know exactly how though.
If I simplify the problem, and say there is only one observation in both series, what would be the estimator for $\alpha$, its distribution and standard error?

Use a binomial regression (a glm) with a log link function. An example is here: https://stats.stackexchange.com/questions/315703/confidence-interval-on-the-percentage-difference-of-two-binomial-distributions/401884#401884 or https://stats.stackexchange.com/questions/289008/confidence-interval-on-binomial-effect-size/406746#406746 or https://stats.stackexchange.com/questions/58594/manipulating-binomial-distribution/453747#453747 — kjetil b halvorsen, Apr 06 '20 at 14:27
You have already formulated a model. One way to fit it would be to use Maximum Likelihood. — whuber, Apr 06 '20 at 14:40
@whuber Thanks! I am trying to but cannot seem to get the likelihood function. Does an MLE approach require that I assume at least one of $p_{1i}$ and $p_{2i}$ is known? — J Li, Apr 07 '20 at 14:43
@kjetilbhalvorsen thanks! These have been very useful reference. Can I ask what is difference in the underlying assumption between that approach and the one here: https://stats.stackexchange.com/questions/3112/calculation-of-relative-risk-confidence-interval, i.e. the `riskratio()` in epitools? — J Li, Apr 07 '20 at 14:45
Try all and see if it matters for your data---riskratio Wald method is most similar to my refs above, but seems to use Wald approximation, it is usually better to use profiling — kjetil b halvorsen, Apr 07 '20 at 15:16

whuber · Answer 1 · 2020-04-09T14:12:26.737

You describe a dataset that could be represented as a sequence of tuples $(n_{1i}, n_{2i}, k_{1i}, k_{2i})$ where $k_{ji}$ is an observation of a random variable $K_{ji}$ that follows a Binomial$(n_{ji}, p_{ji})$ distribution. Your model supposes that all the $K_{ji}$ are independent, the $n_{ji}$ are known, and for each $i,$ $p_{1i}=\alpha\,p_{2i}.$ Thus the unknown parameters are $\alpha,$ whose value you wish to estimate, along with the "nuisance parameters" $p_{2i}.$

Simplify the notation by writing $p_{2i} = p_i.$ In terms of these parameters, the independence assumption implies the likelihood of the data is

$$\mathcal{L} = \prod_i \binom{n_{1i}}{k_{1i}}\left(\alpha p_i\right)^{k_{1i}}\left(1-\alpha p_i\right)^{n_{1i}-k_{1i}}\ \prod_i \binom{n_{2i}}{k_{2i}}\left(p_i\right)^{k_{2i}}\left(1-p_i\right)^{n_{2i}-k_{2i}}.$$

Ignoring the factors that depend only on the data, the part of $\mathcal L$ that depends on the parameters is

$$\mathcal{L}\,\propto\, \prod_i \left(\alpha p_i\right)^{k_{1i}}\left(1-\alpha p_i\right)^{n_{1i}-k_{1i}}\left(p_i\right)^{k_{2i}}\left(1-p_i\right)^{n_{2i}-k_{2i}}.$$

Maximize the likelihood in two stages. First, given some arbitrary value of $\alpha,$ find the $p_i$ that minimize $\mathcal L.$ To do so, let $p=p_i$ be any one of these parameters. The factor of $\mathcal L$ that varies with $p$ is merely

$$\lambda_i(p;\alpha) = \left(\alpha p\right)^{k_{1i}}\left(1-\alpha p\right)^{n_{1i}-k_{1i}}\left(p\right)^{k_{2i}}\left(1-p\right)^{n_{2i}-k_{2i}}.$$

The usual differential Calculus procedure applies: the critical points of $\lambda_i$ (as a function of $p$) are the endpoints $\{0, \min(1,1/\alpha)\}$ of its domain together with the zeros of its derivative. Drop the "$i$" subscripts for the moment. A straightforward calculation shows those zeros satisfy the quadratic equation

$$\alpha n\, p^2 - (\alpha(n_1+k_2)\,+\,n_2+k_1)\,p + k = 0$$

where $n = n_1+n_2$ and $k=k_1+k_2.$ This gives up to four candidate solutions for $p,$ of which the best (the one that makes $\mathcal L$ largest) can be selected by evaluating $\mathcal L$ at each. Doing this for all $i$ maximizes $\mathcal L$ as a function of $\alpha.$ The maximum likelihood is obtained by maximizing this function of $\alpha$ and the value of $\alpha$ that maximizes it is the maximum likelihood estimate $\hat\alpha$. Other values of $\alpha$ for which the deviance

$$2\left(\mathcal{L}(\alpha) - \mathcal{L}(\hat\alpha)\right)$$

is less than the $1 - q^\text{th}$ percentile of the chi-squared distribution with one degree of freedom form a $1-q$ confidence interval for $\alpha.$

Here are graphs of $\mathcal{L}(\alpha)$ for 18 datasets created with $\alpha=1/2.$ The data are indicated in the titles with two lines of the form "$k_{ji}/n_{ji}$" (the top line is for $j=1$). The true value of $\alpha$ is indicated with vertical red dashed lines while the value of $\hat \alpha$ is indicated with a vertical solid black line. The $1-1/18 = 94\%$ confidence intervals are formed by all $\alpha$ for which the graph falls below the horizontal solid red lines.

There appears to be little systematic bias in the estimates.

We would expect $\alpha$ in one of these datasets to fall outside the confidence interval. This occurs in row 2, column 4 and is close to occurring in row 1, column 1 and row 3, columns 5 and 6. However, repetitions of this procedure (with different starting random number seeds) indicate it is working as planned: only about one in every 18 confidence intervals fails to cover the true value of $\alpha.$

This is a fairly difficult test: the sample sizes are small and in several cases there were no "successes" at all in one of the data groups. Further simulations indicate this procedure works well even with tiny datasets (such as two groups of data averaging three observations per group).

This is the R code used to produce the figure.

#
# Quadratic solver.
# Returns real roots of Ax^2 + Bx + C as a 2 X n array.
#
qsolve <- function(A,B,C) {
  D <- B^2 - 4*A*C
  q <- suppressWarnings(-B + ifelse(B>0, -1, 1) * sqrt(D))
  i <- apply(rbind(A,B,C), 2, zapsmall)[1,]==0
  rbind(ifelse(i, -C/B, 2*C / q), ifelse(i, NaN, q / (2*A)))
}
#
# Log likelihood.
#
L <- function(p, alpha, n1, n2, k1, k2) {
  if (is.na(p) || p < 0 || p > 1 || alpha*p > 1) return(Inf)
  log0 <- function(n, x) suppressWarnings(ifelse(n==0, 0, n * log(x))) # log(x^n)
  sum(log0(k1, alpha * p) + log0(n1 - k1, 1 - alpha * p) + 
      log0(k2, p) + log0(n2 - k2, 1 - p))
}
#
# Negative profile log likelihood.
#
lambda <- Vectorize(function(a, n1, n2, k1, k2) {
  alpha <- exp(a)                # Since alpha > 0, use log(alpha) = a as parameter
  p.hat <- qsolve(alpha * (n1 + n2), -(alpha * (n1 + k2) + n2 + k1), k1 + k2)
  p.hat <- t(rbind(p.hat, 0, 1)) # Include endpoints of the interval
  p.hat <- pmax(0, pmin(min(1, 1/alpha), p.hat)) # Restrict to valid values
  Q <- mapply(L, c(p.hat), alpha, n1, n2, k1, k2)# Compute log likelihoods
  Q <- apply(matrix(Q, length(n1)), 1, max)      # Find the maxima
  Q <- ifelse(k1+k2==0 | k1+k2==n1+n2, 0, Q)     # Take care of extreme cases
  -sum(Q)                                        # Negative log likelihood
}, "a")
#
# Simulation.
#
set.seed(17)
alpha.true <- 1/2
nrow <- 3
ncol <- 6
par(mfrow=c(nrow, ncol))
mai <- par("mai")
par(mai=c(0.5,0.3,0.3,0.1))
for (i in 1:(nrow*ncol)) {
  #
  # Data.
  #
  repeat {
    n1 <- 1 + rpois(3, 7)          # 3 = number of groups; 7+1 = mean size
    n2 <- 1 + rpois(length(n1), 7) # 7+1 = mean size of second groups

    p <- pmin(runif(length(n1)), 1/alpha.true)
    k1 <- rbinom(length(n1), n1, pmin(1, alpha.true * p))
    k2 <- rbinom(length(n2), n2, p)
    if (sum(k1)+sum(k2)==0 || sum(k1)+sum(k2)==sum(n1)+sum(n2)) {
      warning("Nothing can be done with MLE.")
    } else {
      break
    }
  }
  #
  # EDA.
  #
  title1 <- paste(k1,n1,sep="/",collapse=" ")
  title2 <- paste(k2,n2,sep="/",collapse=" ")
  #-- Starting estimate for alpha
  alpha.hat <- log(sum(k1)*sum(n2) / (sum(k2)*sum(n1)))
  if (is.infinite(alpha.hat)) alpha.hat <- log(1/(sum(n1) + sum(n2)))
  #
  # MLE.
  #
  fit <- optimize(lambda, lower=alpha.hat-1, upper=alpha.hat+1, 
                          n1=n1, n2=n2, k1=k1, k2=k2)
  #
  # Plotting.
  #
  logalpha.hat <- fit$minimum
  a1 <- min(logalpha.hat, log(alpha.true)-1)
  a2 <- max(logalpha.hat, log(alpha.true)+1)
  curve(lambda(x, n1, n2, k1, k2), a1, a2,
        col="Gray", lwd=2,
        ylab="", xlab="")
  mtext(text=paste0(title1, "\n", title2), side=3, line=0.2, 
        cex=min(1.2, 12/ncol/length(n1)))
  mtext(text=expression(log(alpha)), side=1, line=2.3, cex=0.75)

  abline(v = logalpha.hat, lwd=2)
  abline(v = log(alpha.true), lwd=2, lty=3, col="Red")

  Q <- lambda(logalpha.hat, n1, n2, k1, k2)
  Q.upper <- Q + qchisq(1 - 1/(nrow*ncol), 1)/2
  abline(h = Q.upper, lwd=2, col="Red")
}
par(mai=mai, mfrow=c(1,1))

A regression model for ratio of two Binomial success probabilities

1 Answers1