2

I have a finite Binomial mixture model coded up in stan as below:

data {
    int<lower=1> K; // number of mixture components
    int<lower=1> S; // number of nodes
    int N[S]; // sample size for each nodes
    int y[S]; // number of "successes" for each node
}
parameters{
    simplex[K] lambda; // mixing proportions
    positive_ordered[K] p; // probability of success 
}
model{
    vector[K] log_lambda = log(lambda); //caching
    
    // Priors
    p ~ beta(1, 1);
    
    // Likelihood
    for (s in 1:S){
        vector[K] lps = log_lambda;

        for (k in 1:K){
            lps[k] += binomial_lpmf(y[s] | N[s], p[k]);
        }
        target += log_sum_exp(lps);
    }
}

My issue here is that p needs to be between 0 and 1 but Stan doesn't seem to allow setting bounds for ordered vectors (i.e. ordered<lower=0, upper=1>[K] p doesn't work) and I need them ordered for identifiably. Is there a way to set bounds for an ordered vector?

Also, I tried ordering the simplex instead as in https://discourse.mc-stan.org/t/ordered-simplex/1835 but that didn't work. The sampling took very long and the chains hadn't mixed well.

1 Answers1

2

It sounds like you want $p$ to be a vector of uniform order statistics. (This is what you get when you generate iid uniform variates and sort them.) If that's the case, then generate a $K+1$-vector $z$ of iid exponential variates and set $p$ to be the first $K$ components of the cumulative sum of $Z$, divided by the sum of all the $Z.$ That is,

$$p_i = \frac{z_1+z_2+\cdots+z_i}{z_1+z_2+\cdots+z_{K+1}},\quad i=1,2,\ldots, K.$$

Since each $z_i$ is non-negative, the $p_i$ are in increasing order.

This is illustrated at https://stats.stackexchange.com/a/134245/919 and explained at https://stats.stackexchange.com/a/252784/919.

BTW, one way to generate $z$ is to generate $K+1$ iid uniform variates (just like p in your example, but with one more component) and take their negative logarithms.

whuber
  • 281,159
  • 54
  • 637
  • 1,101
  • Thanks @whuber! I set this up as within my stan model to add `vector[K+1] z` in the parameters block and adding another block as `vector[K+1] p_inter = cumulative_sum(z)/sum(z)` and `p[k] = p_inter[k]` for a K dimensional vector p. However, the same issue persists of taking too long and the chains not mixing as while trying to order a simplex. I don't know whether this is because of how the underlying MCMC algorithm works or because of how I have coded it up. Any ideas? – stan_newbie Jan 20 '21 at 13:35