Bayesian updating when data is partially observed

Question

Consider two distinct coins (COIN_1 and COIN_2) whose respective prior probabilities are given by:

$Pr_1(HEAD_1)=\alpha_1$

$Pr_2(HEAD_2)=\alpha_2$

Suppose that those two coins are jointly tossed $n$ times but the outcome of each joint toss is observed if and only if "HEAD_1, HEAD_2" is realized.

Is it possible to compute $Pr_1(HEAD_1 | ``HEAD_1,HEAD_2" realized\ k\ times\ in\ n\ tosses)$?

So you observe the result of an logical AND operation, i.e. 1 if you both came up heads and 0 otherwise? — Tim, Oct 28 '21 at 09:32
@Firebug nothing assumed a priori, but restrictions could be imposed so that the problem has a solution — capadocia, Oct 29 '21 at 14:25

Tim · Accepted Answer · 2021-10-29T07:49:44.977

You have two unobserved events for tossing heads on the first and second coin, let's call the events $X$ and $Y$. What you observe is another event $Z = X \land Y$.

If you can assume that $X$ and $Y$ are independent, then by definition

$$ \Pr(Z) = \Pr(X, Y) = \Pr(X) \, \Pr(Y) $$

In such a case, you can use a simple model using two latent variables for the probabilities $p_X$ and $p_Y$

$$\begin{align} p_X &\sim \mathsf{Beta}(\alpha_X, \beta_X) \\ p_Y &\sim \mathsf{Beta}(\alpha_Y, \beta_Y) \\ Z &\sim \mathsf{Bernoulli}(p_X p_Y) \end{align}$$

If $\Pr(X)$ and $\Pr(Y)$ are very similar to each other, you wouldn't be able to differentiate between them. However, if they are distinct and you have reasonable priors, this could work. Below I provide a simple example, where informative priors on $p_X$ and $p_Y$ lead to not that bad estimates (parameters of beta distribution can be thought of as pseudocounts, so it is as if prior to seeing the data we observed single head and nine tails for $X$).

library("rstan")
set.seed(42)

x <- rbinom(500, size=1, p=0.14)
y <- rbinom(500, size=1, p=0.57)
z <- x * y

model <- "
data {
    int<lower=0> N;
    int z[N];
}
parameters {
    real p_x;
    real p_y;
}
model {
    p_x ~ beta(1, 9);
    p_y ~ beta(5, 5);
    z ~ bernoulli(p_x * p_y);
}
"

stan(model_code=model, data=list(z=z, N=length(z)))

##         mean se_mean   sd    2.5%     25%     50%     75%   97.5% n_eff Rhat
## p_x     0.18    0.00 0.05    0.10    0.14    0.17    0.20    0.30   818    1
## p_y     0.55    0.00 0.13    0.30    0.45    0.54    0.64    0.80   899    1
## lp__ -161.56    0.03 0.97 -164.31 -161.91 -161.27 -160.87 -160.61  1148    1

Without the assumption of independence, there is no simple relationship between marginal and joint distributions and your data gives you degraded information, so this wouldn't be that simple.

However if you can’t assume that $X$ and $Y$ are independent, but you can trust that the prior marginal probabilities are correct, then it’s even simpler. Notice that by the law of total probability

$$ \Pr(X) = \Pr(X, Y) + \Pr(X, \neg Y) $$

So if you know $\Pr(X)$ and can estimate $\widehat\Pr(X, Y) = \tfrac{\#(X, Y)}{N}$, you can calculate $\Pr(X, \neg Y)$ by subtraction. The same logic applies to calculating $ \Pr(\neg X, Y)$. Next, observe that

$$ \Pr(X, Y) + \Pr(\neg X, Y) + \Pr(X, \neg Y) + \Pr(\neg X, \neg Y) = 1 $$

so you have all the missing pieces besides $\Pr(\neg X, \neg Y)$, than again, it can be obtained by simple algebra.

Your prior needs to encode which one is more likely, it shouldn't work if you got it exchanged, should it? — Firebug, Oct 28 '21 at 17:26
@Firebug correct, it’d give opposite results if you switched priors. — Tim, Oct 28 '21 at 17:54
@Tim I consider the question is answered, but I need to ask if you can direct me to some resource that delves in what rstan is doing to calculate the posteriors in the example you gave (assuming independence). I mean, I wonder if the posterior of $p_x$ is within the beta family, say, $\mathrm{Beta}(\alpha_x + k +t,\beta_x + n - k -t)$ for some $t=t(parameters)$ (exactly or approximated)? — capadocia, Oct 29 '21 at 14:36
@capadocia Stan uses MCMC to sample form the posterior. You this is not a classical beta-binomial model, hence there is no conjugate prior solution. — Tim, Oct 29 '21 at 15:48

Bayesian updating when data is partially observed

1 Answers1