3

Suppose a lepidopterologist wants to estimate the relative proportions of three different species of butterfly. They go out into the field and count $N$ butterflies and record the number of each species $(N_A,N_B,N_C)$. This is a Bayesian lepidopterologist, however, and so they want to use the Dirichlet-Multinomial conjugate pair to obtain a Dirichlet posterior on the probabilities $(p_A,p_B,p_C)$ that an observed butterfly is of each species. If the counts are integer and the prior hyperparameters of the Dirichlet are $(\alpha_A,\alpha_B,\alpha_C)$, then the posterior hyperparameters are simply $(\alpha_A+N_A,\alpha_B+N_B,\alpha_C+N_C)$.

However, the lepidopterologist is not sure of the species of some of the butterflies and so assigns a probability $(p_A^i,p_B^i,p_C^i)$ to each individual butterfly $i$ that it is of each species. This probability is calculated quantitatively and under the assumption that all species are equally likely. These probabilities capture genuine uncertainty in the species determination of a butterly, perhaps because some of members of species A can look like members of species B. These probabilities are also unbiased, i.e. if we observed an infinite number of butterflies with $p_A^i = 0.54$, then 54% of those butterflies would be in species A.

Naively, the posterior hyperparameters would be of the same form but with $N_A$ now being the sum of the probabilities that each butterfly is of species $A$, etc.

Is this correct? If incorrect, are there limits where this is a good approximation?

TLDR: Can we generalise the dirichlet-multinomial conjugate prior-likelihood to fractional count data?

Edit for clarity and to add example data:

We observe $N=4$ butterflies and record that their probability of being a member of each species is:

Butterfly 1 - $p_A = 0.34,p_B = 0.42,p_C = 0.24$

Butterfly 2 - $p_A = 0.14,p_B = 0.23,p_C = 0.63$

Butterfly 3 - $p_A = 0.97,p_B = 0.01,p_C = 0.02$

Butterfly 4 - $p_A = 0.00,p_B = 0.67,p_C = 0.33$

  • How excitly does the data look like? Could you give us example? Hard-coding the guesses as data to be passed to the model does not sound like the best approach. – Tim Apr 12 '19 at 09:56
  • The data would be of the form $(p_A^i,p_B^i,p_C^i)$ for each butterfly $i$, with $p_A^i+p_B^i+p_C^i = 1$. – diagonalisable Apr 12 '19 at 10:27
  • So you know only the probabilities and total count? – Tim Apr 12 '19 at 10:30
  • Yes. We know the total number $N$ and the probabilities for each butterfly. – diagonalisable Apr 12 '19 at 12:38
  • 1
    ]@diagonalisable did you get to a best solution for this? I have a similar problem in my hands. In my case is a bit more complicate as they are likelihoods rather than probabilities for the categorical uncertainty on obdservations. – Valentin Ruano Mar 09 '21 at 16:08
  • How did you end up solving this problem? – Jean-Paul Jun 10 '21 at 11:22

1 Answers1

2

I would change the problem slightly:

$i$ = butterfly observation

$j$ = butterfly class (A,B,C)

$X_{i,j}$ = subjective probability that a butterfly is in class $j$ that is recorded by the observer $i$

$\rho$ = vector of true proportion of butterflies in the observation area of each class

$\alpha_i$ vector of dirichlet parameters for butterfly $i$

Assume that $X_i \sim dirichlet(\alpha_i)$ so that $\alpha_i$ describes the underlying distribution of the recorded observations. Each butterfly might have a different $\alpha_i$ describing the difficulty of classifying that particular butterfly. In other words, if multiple people tried to classify that same butterfly and they all recorded probabilities of classes then their observations would form a dirichlet distribution. We only see one set of probabilities per butterfly.

Now, to simplify, assume that all the $\alpha_i$ are coming from the same underlying distribution which is centered on the $\rho$.

As a Bayesian, you can put a prior on $\alpha$ and then update using MCMC $X_i \sim dirichlet(\alpha)$

As a frequentist, you can get the dirichlet distribution estimates directly.

$\bar{X_j} = \alpha_j / \sum \alpha_j$

$s^2_{X_1} = \frac{z_1 (1 - z_1)}{1 + \sum \alpha_j}$ where $z_1 = \alpha_1 / \sum \alpha_j$

Note: you only need the second equation for one of the classes $j$.

R Carnell
  • 2,566
  • 5
  • 21
  • 1
    Thanks for the answer... however is kinda oversimplying the problem as stated and I cannot apply it to my own that is similar to the one posted. After looking for a solution around I guess the answer is that there is no a convenient conjugate prior in this scenario. Then perhaps what remains to asked that is not asked in the original question, is how to address this in approximate yet efficient way instead of the "brute force" MCMC. – Valentin Ruano Mar 17 '21 at 04:30
  • 1
    I think the "approximate, yet efficient" way to address this is in the frequentist equations I put at the end of the answer. I have used these equations in a very similar situation where I asked experts to quantify class probabilities and then I needed to combine across experts. The only difference is that you truly are showing each expert a different butterfly while in my case, I showed each expert the same object and then elicited the class probabilities. – R Carnell Mar 17 '21 at 17:53
  • The notation above doesn't seem entirely correct. Given observations from each expert $(X_{i}\ \text{for}\ i=1,...,N )$, how would one uncover the parameters $\alpha_j$ needed to fit the final Dirichlet distribution? – Jean-Paul Jun 10 '21 at 11:13
  • I added some clarifying notation. Thanks for the comment. – R Carnell Jun 13 '21 at 22:08