Assume an experiment with 6 outcomes, dubbed A, B, C, D and E. For outcome A, there are Na subject, for B there are Nb subject and so on.
Now assume we fit a multinomial regression using a Bayesian approach,
y ~ Mult($\theta$)
Where $\theta_i$ the probability of observing outcome $i$, and
$\theta_i = \frac{exp(\phi_i)} {\sum exp(\phi)}$
with $\phi_i$ being a linear combination of the $J$ covariates,
$\phi_i = \beta_0 + \sum_j^J \beta_j\cdot x_j$
And there are simple non-informative $\mathcal{N}(0,1)$ priors on the regression coefficients.
Also note that in this model I have not chosen a reference category.
The general interpretation of such a model could be, what is the relative risk (ratio?) of A compared to B? That would be given by,
RR(A, B) = $\frac{exp(\phi_A)}{exp(\phi_B)}$
If anything is unclear, wrong or imprecise in the model, please let me know! Even if it does not answer the actual questions.
Now, my questions are:
1. Is there an actual need to choose a reference category? Or is it purely to make the maximum-likelihood estimation easier? Does it really matter in a Bayesian setting?
- When comparing two groups, does one get an estimate of the Relative Risk or Relative Risk Ratio?
- What if I wanted to compute, say, the RR of belonging to outcome A or B compared to C? Or what about the RR of belonging to A or B, compared to C or D?
A previous answer on SO states that the ratio is a Relative Risk, but I have read some papers reporting it as a Relative Risk Ratio.
Thanks in advance!
edit 1: I have managed to answer one of the questions myself by re-reading the literature.
- Yes! Otherwise the estimate is not unique, due to the constraint that all probabilities must sum to one. That is, $\sum_i \theta_i = 1$. See also Wikipedia
I am still not sure about the other two questions, though.
edit 2 Fitting the model with Stan, it is next to impossible to get it to converge. Inspecting the data more closely, it seems that there is a very large number of zeros. The exact model I am trying to fit is,
$\theta_i = \beta_0 + \beta_{gender} + \beta_{age} + \beta_{age^2}$
Gender is a discretised variable, and age is taken as the midpoint of a group in 5 year intervals. This means that there are 20 combinations in total, and a lot of these observations are zero. That is, for each combination, maybe 2-3 of each outcome has a zero value. I am thinking this is what is causing trouble, and I have been looking into a zero-inflated Poisson (ZIP) model instead. However, from what I have read so far, the ZIP model only handles discrete variables, meaning I would not be able to model the quadratic term, $age^2$.
Any pointers on how to account for zero inflation in multinomial models would be much appreciated!