Is it possible to fit a Dirichlet Regression to changing response variables?

Question

A toy problem to illustrate my issue:

We put a 100 people in a room with 10 candy bars. Each bar is different and has a different brand, flavor, size, color, etc. We ask each person in the group to choose the candy bar that looks the most appealing.

100 people
10 candy bars
each person chooses 1 bar

At the end of the experiment we record each person's candy bar choice and figure out what percent of the group choose each candy bar.

obs_1     
Bar 1: 10%
Bar 2: 25%
Bar 3: 14%
Bar 4: 1% 
Bar 5: 2% 
Bar 6: 40%
Bar 7: 2% 
Bar 8: 3% 
Bar 9: 1% 
Bar 10: 2%

Let's say we ran this experiment 10 times with different groups and for the 11th experiment, we wanted to try to forecast what percent of people would choose each candy bar.

Because the sum of all candy choice percentages will be 1 (sum to unity) every time and there is an element of randomness, this seems like a perfect problem to model with a Dirichlet distribution. I'm hopeful that we can forecast choices in future experiments with a Dirichlet Regression (DirichletReg in R) or possibly using STAN.

Question 1: I'm at a loss as to how to format my data for a Dirichlet Regression. In my real data the "candy bars" are constantly being replaced. In one observation I'll have bars 1-10, but in the next observation bars 1 and 4 are replaced by bars 11 and 12 and might look like this:

obs_2      
Bar 11: 9% 
Bar 2: 13% 
Bar 3: 26% 
Bar 12: 22%
Bar 5: 1%  
Bar 6: 20% 
Bar 7: 2%  
Bar 8: 3%  
Bar 9: 1%  
Bar 10: 2%

My full dataframe looks like this:

+--------------+------------+----------------+-----+------------+----------------+
| obs_number   | bar_1_size | bar_1_chosen_% | ... | bar_15_size| bar_15_chosen_%|
+--------------+------------+----------------+-----+------------+----------------+
| 1            | 15.5       | 10.0           | ... | NULL       | NULL           |
+--------------+------------+----------------+-----+------------+----------------+
| 2            | NULL       | NULL           | ... | NULL       | NULL           |
+--------------+------------+----------------+-----+------------+----------------+
| ...          | ...        | ...            | ... | ...        | ...            |
+--------------+------------+----------------+-----+------------+----------------+
| 11           | 15.5       | NULL           | ... | 33.6       | NULL           |
+--------------+------------+----------------+-----+------------+----------------+

There's a column for every bar's size and chosen_percentage even if it wasn't an option in an experiment(like Bar 15 which wasn't an option in experiment 2, but was an option in experiment 11).

How would I properly format this data for DirichletReg?

Question 2: This is more advanced, but how would I format the regression formula? I'm really lost on this one, I'm not sure if I should use the "common" or "alternative" models or how I would format the formula.

I know this is a long question, but I appreciate you taking the time to read it and any help you all can offer. Cheers!

This problem seems more suited to multinomial regression (with 10 discrete categories) than using a Dirichlet. — guy, Sep 30 '19 at 15:55
*"Each box contains different candy bars"* Do you mean that there are *multiple different* bars in a *single* box. — Sextus Empiricus, Sep 30 '19 at 16:18
The different cases 'box 1 - 10' 'box 1 - 4' and 'box 11 - 12' each independently are distributed according to a multinomial distribution. But a different distribution (with different parameters) each time. You will need to have some theory to relate the parameters of those different distributions, before you can start to think about applying a statistical model and test. — Sextus Empiricus, Sep 30 '19 at 16:22
@MartijnWeterings Good catch, I edited my example to clarify that there are 10 choices of bars. (I took the boxes out) — George, Sep 30 '19 at 16:41
you mean that different bars had different, varying actual sizes, that affected choice because people would chose bigger bars more often? — carlo, Sep 30 '19 at 16:46
@carlo I just used size as an example of a possible predictor. The idea is that there exist different features of each bar that could be used to predict how many people chose it in the context of other bar choices. — George, Sep 30 '19 at 16:52
so, in fact, the total number of bars is given by the composition of all possible features levels, and there is no main feature that sets apart 10 or maybe more bars? what I am really asking is: what is N in the table you drew? — carlo, Sep 30 '19 at 17:06
@carlo There are N total candy bars that exist, being put out in the experiment 10 at a time. — George, Sep 30 '19 at 17:13

score 1 · Answer 1 · edited Jun 11 '20 at 14:32

Disclaimer/ Important Note:

The example below assumes that the ratio of probabilities to select certain bars is completely independent from the specific selection available.

This is likely not the case. For instance the decoy effect is an obvious example that shows that relative probabilities to choose a specific bar may drastically change (even inverse) depending on the options that are available to the participants/consumer.

If you wish to capture these effects then you will need to make the model much more flexible. But then the question is more about how to model preference (the deterministic part of the underlying probabilities) rather than modelling the random behaviour.

The example below is an example how you could proceed once you have a model. It shows how you can use the data to estimate the parameters by finding the parameters that maximize the likelihood function.

But note that to obtain a reasonable idea about the model to use you should first start by exploring/plotting your data, see what it is and how it behaves, before you apply a statistic model (such as assuming a Dirichlet distribution).

Logistic regression

You might model your system in terms of the Gibbs Measure (that means a multinomial distributed variable, and not a Dirichlet distributed variable).

Let $P(Y_i = j)$ be the probability that a person $i$ selects candybar $j$ with properties $X_{j}$:

$$P(Y_i = j) = \frac{e^{-\beta X_j}}{\sum_{\forall j} e^{-\beta X_j}}$$

Here the term $e^{-\beta X_j}$ relates to logistic regression (you might wish to model it differently, more complex, as mentioned before) and the term $\sum_{\forall j} e^{-\beta X_j}$ is a normalization constant.

Example computation, maximizing log likelihood

This normalization constant is problematic in your situation because the available bars $j$ are not constant every time. I do not know of a package that allows you compute this (it is also not a typical situation). The example below shows how you might do it (manually) by finding the parameters that optimize the likelihood function:

set.seed(1)
# sample size
n <- 1000

# generate theoretic probabilities and properties of bar choice
bar_len <-   c(0.5,1,1.5,2,0.5,1,1.5,2)
bar_sugar <- c(1,1,1,1,2,2,2,2)
p_bars = c(exp(bar_len+bar_sugar)/sum(exp(bar_len+bar_sugar)),0)

# some table for sampling of the bars with different type and number of bars
barsamples <- t(sapply(1:n, FUN = function(x) {r <- 1+rbinom(1,7,0.5);  c(sample(1:8,r),rep(9,8-r))}))
barsamples

# simulating the choice of participants 
barprobs <- matrix(p_bars[barsamples],n)
choice <- sapply(1:n, FUN = function(x) sample(barsamples[x,],1,prob=barprobs[x,]))

#
# estimating model parameters by maximizing likelihood
#

# likelihood function 
# the first argument 'par' will be optimized by the call to the function 'optim'
#
loglike <- function(par, choi = choice, bsamp = barsamples, blen=bar_len, bsug=bar_sugar) {
  # compute theoretic probabilities
  p <- c(exp(par[1]*blen + par[2]*bsug),0)
  # compute probailities
  p_choice <- p[choi]/rowSums(matrix(p[bsamp],n))
  -sum(log(p_choice))
}

optim(c(0,0),loglike)

which in this case has model parameters $1,1$ and will return as result $0.87, 0.91$.

I imagine that in a more typical situation you would not use a linear model as above but instead some sort of neural network that allows to capture more various features than the fixed presupposed (limited) linear model.

Dirichlet regression

Dirichlet regression models your variables as a Dirichlet distributed variable. But a multinomial distribution feels more natural to me (probabilities for counts). Your raw data is categorical, and has values 0 or 1. This is not what a Dirichlet distribution describes (it describes a continuous distribution).

Possibly, you wish to model the distribution like a Dirichlet-multinomial distribution.

If you insist on using a Dirichlet distribution then you would need to have outcome variables that describe fractions. E.g. you have several times a group of 100 people and you observe the fractions of bars in those groups. Here you will assume that for a single group the distribution will be multinomial, but the parameters that describe that multinomial distribution are themselves variable (not every group of 100 people is the same) and distributed according to a Dirichlet distribution.

In any case, because you are dealing with a result vector that is not continuous you may not be able to use standard R packages like DirichletReg. Such packages model the outcome variable $Y_i$ for each sample $i$ as a ratio of different fractions ($Y_i$ is a vector) as a function $\beta X_i$, but here $X_i$ are properties of the sample $i$ (describing the group, e.g. age, gender, etc.).

Hack example

In those standard packages (which model $Y_i$ as a single function of properties $X_i$ that does not allow different $Y_i$) there is no way to make the vector $Y_i$ with different sizes for different samples/observations $i$.

You might hack it by adding some variables in the matrix $X_i$ that describes whether or not the particular option was present or not, but it is not natural/exact/realistic/precise. (I show this to illustrate the idea, I am not suggesting that you should do this)

Example with multinom package

# make predictor variable 'choicematrix' that is defined by whether or not the 
# particular bar was a choice or not.
sel <- 1-sapply(1:8, FUN <- function(x) rowSums(barsamples == x))
choicematrix <- matrix(rep(0,8*n),n)
for (i in 1:1000) {
  choicematrix[i,choice[i]] <- 1
}

# modeling as a multinomial distribution with 'multinom'
mod <- multinom(choicematrix ~ 1+sel)

# getting intercept term
intercept <- c(0,coef(mod)[,1])
# normalizing
intercept <- exp(intercept)/sum(exp(intercept))

# compare
plot(p_bars[-9],intercept, xlim=c(0,1),ylim=c(0,1))
lines(c(0,1),c(0,1))

This models the probability for each bar to be selected as a function of whether or not certain bars are in the sample. (so the properties $X_i$ refer to the test conditions defining which bars where not in the sample). The intercept will refer to the solution giving a probability to each bar (but not as some function of the properties of that bar).

I appreciate your response and the time/attention you've given to my question. In my real data there are groups >> 100, anywhere from the range 30k-500k. So I think treating my response variable as continuous will be relatively accurate. (Which I know I didn't say in my question and I'm starting to think I should maybe re-ask it with my actual variables/data to make answerers lives easier). I also tried something similar to your hack example where I included a binary variable `is_option` and included it in my regression coefficients (size * is_option) to make any unavailalble options == 0. — George, Oct 01 '19 at 16:51
I then tried to use DirichletReg pretending that all bars were available at each observation, but I got a collinearity error (also it felt hacky so I didn't really dig into the solution much). — George, Oct 01 '19 at 16:53
@George it only requires little adaptation for my answer to be adapted for a Dirichlet distribution (just a different likelihood function). My main concern remains and that is that you need to start from a well defined model, instead of just applying any arbitrary fitting. — Sextus Empiricus, Oct 01 '19 at 17:03
Sorry about that, I should have mentioned (though you probably have figured) that my experience is in software engineering and not applied statistics. I'm not used to solving statistical problems and I can see I'm getting ahead of myself. — George, Oct 01 '19 at 17:07
Let me take a step back. I have explored my data quite a bit and I started (similar to is stated in your response) with an assumption of independence. I treated each bar in each experiment as independent and tried using linear regression to model the choice_%. I found that there were features (sugar, length, etc.) that had a significant effect on choice_%. I also noticed that the sum of all choice_%s in an experiment was != 1. Sometimes it was very far off. Which lead me to believe that I could improve the model. — George, Oct 01 '19 at 17:28
Which lead me to google for 'component regression' and find this paper about DirichletReg (http://epub.wu.ac.at/4077/) and another about modeling player selection in fantasy games (http://www.sloansportsconference.com/wp-content/uploads/2018/02/1001.pdf) (pages 7/8). I basically found these two papers and fixated on a dirichlet regression as the underlying model for choice_%. Especially because the problem they are solving in the fantasy paper is very close to the real problem I'm trying to solve. — George, Oct 01 '19 at 17:48
@George also note that 'group size' is likely going to influence the variance of your observable variable. Large and small groups will have a different distribution of the fractions. With a Dirichlet multinomial distribution you can take this into account. When you treat all observations separately (everything as groups of size 1) then the two coincide (but possibly you might have computational profit from treating larger groups at once). — Sextus Empiricus, Oct 01 '19 at 17:57
It's also important to know that I don't have exact counts for my data, I only have the percentages as real numbers (12.5, 44.2, etc). — George, Oct 01 '19 at 17:58
When you have percentages plus group size then you can convert it to single accounts (although it would be unnecessary). — Sextus Empiricus, Oct 01 '19 at 18:01
The point about the Dirichlet multinomial distribution is actually different than what I said before: The Dirichlet multinomial distribution models the *group* probabilities as a Dirichlet distribution (and you might wish to model the *individual* probabilities as independent Dirichlet distributed, [in which case your counts will be, after all, binomial distributed](https://stats.stackexchange.com/questions/105908/sum-of-beta-bernoulli-variables)). — Sextus Empiricus, Oct 01 '19 at 18:27
I apologize because I feel that most of what you are suggesting is going above my head. Maybe I'm not really ready to approach this problem. I think I'm going to take some time to learn more about bayesian modeling, different distributions and STAN and come back and ask this question again when I feel more comfortable. Do you have any recommended learning resources or things it might be helpful for me to read? — George, Oct 01 '19 at 18:41

score 0 · Answer 2 · edited Jun 11 '20 at 14:32

Bayesian multinomial regression

I would recommend (as stated in the comments) Multinomial Regression, whose conjugate prior is the Dirichlet Distribution, so your posterior will be a Dirichlet Distribution.

In this case, there is no difference between running the experiment 10 times with N different candidates each time, or thinking of this as "1 experiment with 10N" candidates.

If your prior is $Dirichlet(\alpha_{1}, \alpha_{2}, \ldots, \alpha_{10})$, your posterior will be $Dirichlet(\alpha_{1}+n_{1}, \alpha_{2}+n_{2}, \ldots, \alpha_{N}+n_{10})$

In which $n_{1}$ is the number of people who chose candy bar 1 as their favourite, and so forth.

Your prior should probably be (unless you have information that you did not state), that all candy bars are equally popular, and thus $\alpha_{1}=\alpha_{2}=\ldots=\alpha_{10}$, but what value these take, will govern how restrictive you want to be, i.e. how much you wish to penalise as implausible, posteriors in which some candy bars are much more popular than others.

If you set all $\alpha_{i}$ to 1, that's the equivalent of a uniform prior.

In terms of how to prepare your data, for each candy bar, you only need to calculate how many people chose it as their favourite, and then your posterior is analytically known, as above.

Also, if you want to know the posterior for any individual candy bar (i.e. what fraction of the world (call this q) is likely to prefer candy bar k), that is given by the beta distribution, $\beta(q| \alpha_{k} + n_{k}, \sum_{j\neq k} \alpha_{j} + n_{j}$

Bayesian multinomial regression when not sampling all candidates

Edit (as I had originally misread your post/you added more information, see comments): I would still propose a solution which involves Multinomial Regression, leading to a Dirichlet Posterior, which ignores the features of the candy bars (e.g. size, flavour). What underpins all of this, is that if you knew the probability of a member of the general population preferring each of candy bars, assuming they can choose from the full selection, you need to make an assumption of how their choice would vary, if you removed some bars from the pool. If you make an assumption that their preferences are redistributed in some random/proportional way (see below), and not that people who prefer the largest bar are likely to prefer the second largest one if the largest is removed from the pool, then I suggest ignoring the features completely.

For simplicity, let us assume there are only 3 different bars for now.

We know that a random selected member of the population has a probability $q_{i}$ of saying they prefer the $i^{th}$ bar, such that $\sum_{i=1}^{3}q_{i}=1$. We now ask somebody chosen at random to tell us whether they prefer bar 1 or bar 2 (so we don't show them bar 3). In that case, I am going to assume that the probability they will prefer bar 1 is $p_{1}=\frac{q_{1}}{q_{1}+q_{2}}$ and the probability they will prefer bar 2 is given by $p_{2}=\frac{q_{2}}{q_{1}+q_{2}}$

Clearly, this is a simplifying assumption, as another response to this question states, the Decoy Effect makes this potentially a bit shakey, and as I stated above, candy bar 3 might be more similar to 2 than it is to 1. But this assumption might work well in scenarios in which all bars are sufficiently different and it allows for a tractable solution. In words, this solution assumes that if we remove bar 3 from the pool, the people who would have preferred it transfer their preferences to 1 and 2 randomly, but according to the ratio in which people prefer bars 1 and 2.

So given we perform one experiment only, and we want to determine $q_{1},q_{2},q_{3}$ from some data, we want to calculate $P(\underline{q}|D)$, which by Bayes is given by $\frac{P(D|\underline{q})P(\underline{q})}{P(D)}$. The only way this is different to more conventional multinomial regression, is that $P(D|\underline{q})$ is not just a categorical distribution.

In this case, if the first experiment resulted in $n_{11}$ people choosing bar 1 and $n_{12}$ people choosing bar 2 (the first index in the subscript indicating that this was experiment number 1, and the second denoting which bar it was), and $n_{13}=0$ because bar 3 was not an option in experiment 1, then the probability of seeing the data we saw, given some vector q, is given by (up to a constant) $\left(\frac{q_{1}}{q_{1}+q_{2}}\right)^{n_{11}}\left(\frac{q_{2}}{q_{1}+q_{2}}\right)^{n_{12}}$

Using a Dirichlet prior with parameter vector $\underline{\alpha}$ yields:

$P(q|D_{1}) \propto \left(\frac{q_{1}}{q_{1}+q_{2}}\right)^{n_{11}}\left(\frac{q_{2}}{q_{1}+q_{2}}\right)^{n_{12}} q_{1}^{\alpha_{1}-1}q_{2}^{\alpha_{2}-1}q_{3}^{\alpha_{3}-1}$

(I've added a subscript to the D here, to denote this is the data from experiment 1). The proportionality coefficient is hard to calculate, and will have to be calculated numerically, but because there's more to come, let's just treat it as a number tbd later for now.

So now, let's say in experiment 2, we only let them choose between candy bars 2 and 3. Then $P(D_{2}|q)\propto \left(\frac{q_{2}}{q_{2}+q_{3}}\right)^{n_{22}}\left(\frac{q_{3}}{q_{2}+q_{3}}\right)^{n_{23}}$, where $n_{22}$ and $n_{23}$ are the numbers of people who chose bars 2 and 3 in experiment 2 respectively.

So now if you want to know $P(q|D_{2})$ (which technically is $P(q|D_{2}, D_{1})$), you can do the same trick, but now, you use $P(q|D_{1})$ where you would have used your Dirichlet prior $P(q)$ previously.

Explicitly: $P(\underline{q}|D_{1},D_{2})\propto \left(\frac{q_{2}}{q_{2}+q_{3}}\right)^{n_{22}}\left(\frac{q_{3}}{q_{2}+q_{3}}\right)^{n_{23}}\left(\frac{q_{1}}{q_{1}+q_{2}}\right)^{n_{11}}\left(\frac{q_{2}}{q_{1}+q_{2}}\right)^{n_{12}} q_{1}^{\alpha_{1}-1}q_{2}^{\alpha_{2}-1}q_{3}^{\alpha_{3}-1}$

I hope you can see a pattern developing, and how this process can be continued over many experiments with any combination of available candy bars.

There is one final step, which is to turn the constant of proportionality into an equality. I'll outline here for this case, where there were only 2 experiments.

We know that $\int d\underline{q} P(\underline{q}|D_{1},D_{2})=1$, and thus:

$P(\underline{q}|D_{1},D_{2})= \frac{\left(\frac{q_{2}}{q_{2}+q_{3}}\right)^{n_{22}}\left(\frac{q_{3}}{q_{2}+q_{3}}\right)^{n_{23}}\left(\frac{q_{1}}{q_{1}+q_{2}}\right)^{n_{11}}\left(\frac{q_{2}}{q_{1}+q_{2}}\right)^{n_{12}} q_{1}^{\alpha_{1}-1}q_{2}^{\alpha_{2}-1}q_{3}^{\alpha_{3}-1}}{\int d\underline{q} \left(\frac{q_{2}}{q_{2}+q_{3}}\right)^{n_{22}}\left(\frac{q_{3}}{q_{2}+q_{3}}\right)^{n_{23}}\left(\frac{q_{1}}{q_{1}+q_{2}}\right)^{n_{11}}\left(\frac{q_{2}}{q_{1}+q_{2}}\right)^{n_{12}} q_{1}^{\alpha_{1}-1}q_{2}^{\alpha_{2}-1}q_{3}^{\alpha_{3}-1}}$

Note that this integral is over a complex multidimensional surface on which $q_{1}+q_{2}+q_{3}=1$ and $0 \leq q_{i} \leq 1, \forall i$

I suspect this integral cannot be done analytically, so you'll need to solve numerically. I'm no expert on which Monte Carlo sampling method you'll want to use, perhaps somebody else can suggest whether this is best suited to HCMC or other. Whether you need to normalise will however depend on what you want to do with it.

Similarly, you'll need to do more numerical integrals if you want to calculate $\langle q_{i} \rangle$, i.e. the expected fraction of the general population who will prefer bar i.

I added more clarifying information to the question, it might give you a better idea of what I'm asking. I definitely agree about the uniform prior. — George, Sep 30 '19 at 17:29
so the point I'd missed, or perhaps you just added, is that in experiment 1 and 2, you don't necessary have the same candy bars. If Candy bar 1 has a value in experiment 1 and is null in experiment 2, and candy bar 11 is null in experiment 1 and present in 2, what does this mean? Does this just mean that nobody chose candy bar 11 in experiment 1, or does this mean that it wasn't there as an option during experiment 1? — gazza89, Sep 30 '19 at 17:46
I thought it was there, but I realized the question could use a lot of clarifying. A null value for chosen_% means that it wasn't an option. There are some number of candy bars available to display to people (I arbitrarily have that number as 15 in the example), but only 10 candy bars are displayed at a time. — George, Sep 30 '19 at 17:53
I think you need in that case, to think a little bit about what the final thing you're after is. Usually, statisticians would answer the question "if you asked everybody in the world, which candy bar they would prefer, giving them all 15 of them as an option, what fraction of the population would choose each bar?". I think that question is not simple to answer given that data, it's quite complex, because you need to start assuming what a person does, if their favourite bar would have been number 3, but it wasn't available. Presumably their vote is randomly allocated across the available? — gazza89, Sep 30 '19 at 17:59
Yeah, I appreciate your insight and helping me work through the problem. I guess I'm thinking about which bars people will prefer as a function of the qualities of the bar (size, flavor, etc) rather than of the people's own preferences (which are unknown to us). I think we can model the problem with a dirichlet distribution who's alpha values are predictable given past experiments and then sample that distribution to gain insights about upcoming experiments. I'm just not really sure how to do that with the existing tools (I'm a software engineer not a statistician) — George, Sep 30 '19 at 18:14
To be more precise, is the question you're trying to answer "we have these 15 candy bars, and there will not be more or less in future. We have performed various experiments in which people were asked which was their preferred one, but from a subset of the 15, and now we want to know, if we asked a randomly chosen person from the world's population, which of the 15 was their preferred one (but given the ability to chose from all 15), what the probability of each of the 15 is?" — gazza89, Sep 30 '19 at 18:27
we have these 15 candy bars, and there will not be more or less in future. We have performed various experiments in which people were asked which was their preferred one, but from a subset of the 15, and now we want to know, if we asked a randomly chosen person from the world's population, which FROM A NEW SUBSET was their preferred one, what the probability of each of the BARS IN THAT NEW SUBSET is? (Sorry for using caps, I didn't know how else to highlight the differences) — George, Sep 30 '19 at 18:52
Thanks George. I think that in order to answer the question as I stated it (i.e. from the entire subset), you have to make an assumption about how people's preferences transfer from one candy bar, if it becomes unavailable. Once you've made that assumption, you can then answer your slightly different question. I'll try to find time to edit my original post and update the maths, but basically I think you need an extra assumption in order to solve this problem tractably. — gazza89, Oct 01 '19 at 10:09
The part after your edit (*"Edit (as I had originally misread your post/you add more information, see comments):"*) is difficult to follow. But your idea of using a Bayesian approach with the likelihood function a multinomial distribution and the prior/posterior Dirichlet distribution is interesting (although it is not clear how to incorporate the properties of the bars and this models the different bars more as separate without using the properties of the bars). Then you can see the process as repeatedly updating marginal Dirichlet distributions according to your first formula. — Sextus Empiricus, Oct 01 '19 at 12:21
if there are specific steps that are hard to follow, I'm happy to have a go at improving the explanation. As regards your point on it being difficult to see how to incorporate properties of the bars, I fully agree, it's probably not possible. The whole approach relies on an assumption of how preferences for bar i are redistributed if it is removed from the available pool, and this being independent of these characteristics. — gazza89, Oct 01 '19 at 13:00
I am wondering whether a Dirichlet distributed prior will be a Dirichlet distributed posterior when only a restricted amount of the fraction is observed. Does your answer show this? — Sextus Empiricus, Oct 01 '19 at 13:11
yes, my answer does discuss this. I believe the answer is no, if you make the assumption I make (big if), because your likelihood function starts to incur terms like $(q_{1}+q_{2})$ in the denominator, so the Dirichlet Distribution is no longer the conjugate prior to this likelihood function.Note the line where I write Using a Dirichlet prior with parameter vector $\alpha$ yields: $P(q|D_{1})\propto (\frac{q_{1}}{q_{1}+q_{2}})^{n_{11}} (\frac{q_{2}}{q_{1}+q_{2}})^{n_{12}}q_{1}^{\alpha_{1}-1}q_{2}^{\alpha_{2}-1}q_{3}^{\alpha_{3}-1}$, this is no longer a Dirichlet Distribution — gazza89, Oct 01 '19 at 13:31
It might be closely related to a Grouped Dirichlet Distribution https://doi.org/10.1016/j.jmva.2007.01.010 or tho this generalization https://en.wikipedia.org/wiki/Generalized_Dirichlet_distribution — Sextus Empiricus, Oct 01 '19 at 13:42