2

Suppose I want to sample N values (uniformly) between 0 and 1, subject a sum-to-one constraint. One possibility would be to use a reject-step:

repeat  
    x = rand(N-1)  
    sumX= sum(x)  
until sumX <1.0  
x(N) = 1-sumX  

which is feasible for small N but for larger N the following would be more efficient:

x = -log(rand(N))  
x = x/sum(x)  

See e.g. Generate uniformly distributed weights that sum to unity?

The thing is that I additionally need to impose lower and upper boundaries (somewhere between 0 and 1) on some of the N values. These constraints are guaranteed to be compatible with the sum-to-one constraint. For example, suppose N=5 with lower and upper boundaries for the first 4 numbers being:
lower= [0, 0.1, 0.1, 0] and upper = [1, 0.7, 0.8, 0.3].

The accept/reject approach remains essentially the same - and even becomes somewhat more efficient:

lower=[0;0.1;0.1;0]  
upper=[1;0.7;0.8;0.3]  
range=upper-lower  

repeat  
    x = rand(N-1) * range + lower  
    sumX= sum(x)  
until sumX <1.0  
x(N) = 1-sumX  

but for larger N it's still inefficient. Hence my question: how can I implement this in a way that remains feasible for larger N - and with variable boundaries?

Marzz
  • 21
  • 2
  • Could you please write down the density of your vector? It would clarify what your target distribution is. – Xi'an Mar 08 '15 at 20:17
  • 2
    "N values (uniformly) between 0 and 1" that "sum to 1" is a contradiction, since if they're uniform between 0 and 1, their expectation is 0.5, while if they sum to 1 their average must be exactly 1/N. The only case where that's not a contradiction is N=2. Please help resolve this issue. – Glen_b Mar 08 '15 at 23:36
  • Sorry, I'm not sure how to answer your questions. The question is how to (efficiently) replace the 2nd accept/reject algorithm by a sampling scheme that remains feasible for larger N (similar to how the 1st accept/reject algorithm was changed). Does that help? – Marzz Mar 09 '15 at 10:24

1 Answers1

1

A direct answer to the question as I understand it (!) is to

  1. take a uniform sample $(U_1,\ldots,U_n)\stackrel{\text{iid}}{\sim}\mathcal{U}(0,1)$, with density$$\prod_{i=1}^n \mathbb{I}_{(0,1)}(u_i)$$
  2. derive the joint distribution of the $U_i$'s except $U_n$ and of the sum of the $U_i$'s, $V=U_1+\cdots+U_n$, which has joint density $$\prod_{i=1}^{n-1} \mathbb{I}_{(0,1)}(u_i)\times\mathbb{I}_{(0,1)}(v-u_1-\cdots-u_{n-1})$$
  3. conclude with the conditional distribution of $(U_1,\ldots,U_n)$ conditional on $V=1$ [which has a density known as the Irwin–Hall distribution although this is irrelevant here!], conditional distribution with density proportional to$$\prod_{i=1}^{n-1} \mathbb{I}_{(0,1)}(u_i)\times\mathbb{I}_{(0,1)}(v-u_1-\cdots-u_{n-1})\times\mathbb{I}_{v-u_1-\cdots-u_{n-1}}(u_n)$$

In practice, you can produce a sample from this target density using Gibbs sampling, as illustrated in this earlier answer of mine.

Xi'an
  • 90,397
  • 9
  • 157
  • 575