18

It is common to use weights in applications like mixture modeling and to linearly combine basis functions. Weights $w_i$ must often obey $w_i ≥$ 0 and $\sum_{i} w_i=1$. I'd like to randomly choose a weight vector $\mathbf{w} = (w_1, w_2, …)$ from a uniform distribution of such vectors.

It may be tempting to use $w_i = \frac{\omega_i}{\sum_{j} \omega_j}$ where $\omega_i \sim$ U(0, 1), however as discussed in the comments below, the distribution of $\mathbf{w}$ is not uniform.

However, given the constraint $\sum_{i} w_i=1$, it seems that the underlying dimensionality of the problem is $n-1$, and that it should be possible to choose a $\mathbf{w}$ by choosing $n-1$ parameters according to some distribution and then computing the corresponding $\mathbf{w}$ from those parameters (because once $n-1$ of the weights are specified, the remaining weight is fully determined).

The problem appears to be similar to the sphere point picking problem (but, rather than picking 3-vectors whose $ℓ_2$ norm is unity, I want to pick $n$-vectors whose $ℓ_1$ norm is unity).

Thanks!

Chris
  • 561
  • 1
  • 5
  • 10
  • 3
    Your method does not generate a uniformly distributed vector on the simplex. To do what you want correctly, the most straightforward way is to generate $n$ iid $\mathrm{Exp}(1)$ random variables and then normalize them by their sum. You could try to do it by finding some other method to draw only $n-1$ variates directly, but I have my doubts regarding the efficiency tradeoff since $\mathrm{Exp}(1)$ variates can be *very* efficiently generated from $U(0,1)$ variates. – cardinal Aug 09 '11 at 22:24

3 Answers3

24

Choose $\mathbf{x} \in [0,1]^{n-1}$ uniformly (by means of $n-1$ uniform reals in the interval $[0,1]$). Sort the coefficients so that $0 \le x_1 \le \cdots \le x_{n-1}$. Set

$$\mathbf{w} = (x_1, x_2-x_1, x_3 - x_2, \ldots, x_{n-1} - x_{n-2}, 1 - x_{n-1}).$$

Because we can recover the sorted $x_i$ by means of the partial sums of the $w_i$, the mapping $\mathbf{x} \to \mathbf{w}$ is $(n-1)!$ to 1; in particular, its image is the $n-1$ simplex in $\mathbb{R}^n$. Because (a) each swap in a sort is a linear transformation, (b) the preceding formula is linear, and (c) linear transformations preserve uniformity of distributions, the uniformity of $\mathbf{x}$ implies the uniformity of $\mathbf{w}$ on the $n-1$ simplex. In particular, note that the marginals of $\mathbf{w}$ are not necessarily independent.

3D point plot

This 3D point plot shows the results of 2000 iterations of this algorithm for $n=3$. The points are confined to the simplex and are approximately uniformly distributed over it.


Because the execution time of this algorithm is $O(n \log(n)) \gg O(n)$, it is inefficient for large $n$. But this does answer the question! A better way (in general) to generate uniformly distributed values on the $n-1$-simplex is to draw $n$ uniform reals $(x_1, \ldots, x_n)$ on the interval $[0,1]$, compute

$$y_i = -\log(x_i)$$

(which makes each $y_i$ positive with probability $1$, whence their sum is almost surely nonzero) and set

$$\mathbf w = (y_1, y_2, \ldots, y_n) / (y_1 + y_2 + \cdots + y_n).$$

This works because each $y_i$ has a $\Gamma(1)$ distribution, which implies $\mathbf w$ has a Dirichlet$(1,1,1)$ distribution--and that is uniform.

[3D point plot 2]

whuber
  • 281,159
  • 54
  • 637
  • 1,101
  • Is this an algorithm for sampling from Dir(1)? – Chris Aug 10 '11 at 10:48
  • 1
    @Chris If by "Dir(1)" you mean the Dirichlet distribution with parameters $(\alpha_1, \ldots, \alpha_n)$ = $(1,1,\ldots,1)$, then the answer is yes. – whuber Aug 10 '11 at 12:04
  • 1
    (+1) One minor comment: The intuition is excellent. Care in interpreting (a) may need to be taken, as it seems that the "linear transformation" in that part is a *random* one. However, this is easily worked around at the expense of additional formality by using exchangeability of the generating process and a certain invariance property. – cardinal Aug 10 '11 at 12:25
  • 1
    More explicitly: For distributions with a density $f$, the density of the order statistics of an iid sample of size $n$ is $n! f(x_1)\cdots f(x_n) 1_{(x_1 < x_2 < \cdots < x_n)}$. In the case of $f = 1_{[0,1]}(x)$, the distribution of the order statistics is uniform on a polytope. Taken from this point, the remaining transformations are deterministic and the result follows. – cardinal Aug 10 '11 at 12:47
  • 1
    @cardinal That's an interesting point, but I don't think it matters, although you're right that additional details could help. The swaps (actually reflections, *qua* linear transformations) are not random: they are predetermined. In effect, $I_{n-1}=[0,1]^{n-1}$ is carved into $(n-1)!$ regions, of which one is distinguished from the others, and there's a predetermined affine bijection between each region and the distinguished one. Whence, the only additional fact we need is that a uniform distribution on a region is uniform on any measurable subset of it, which is a complete triviality. – whuber Aug 10 '11 at 12:51
  • 2
    @whuber: Interesting remarks. Thanks for sharing! I always appreciate your insightful thoughts on such things. Regarding my previous comment on "random linear transformation", my point was that, at least through $\mathbf{x}$, the transformation used depends on the sample point $\omega$. Another way to think of it is there is a fixed, predetermined function $T: \mathbb{R}^{n-1} \to \mathbb{R}^{n-1}$ such that $\mathbf{w} = T(\mathbf{x})$, but I wouldn't call that function linear, though it is linear on subsets that partition the $(n-1)$-cube. :) – cardinal Aug 10 '11 at 16:48
  • 1
    @cardinal That's correct (and so I have been careful not to call *T* linear!): the function is *piecewise* affine. We're really just applying the standard formula for a change of variable, paying attention to the fact that the mapping is not one-to-one, and exploiting the constancy of the Jacobians involved. But, having made those observations, it's clear we don't have to actually compute the Jacobians. – whuber Aug 10 '11 at 17:04
1
    zz <- c(0, log(-log(runif(n-1))))
    ezz <- exp(zz)
    w <- ezz/sum(ezz)

The first entry is put to zero for identification; you would see that done in multinomial logistic models. Of course, in multinomial models, you would also have covariates under the exponents, rather than just the random zzs. The distribution of the zzs is the extreme value distribution; you'd need this to ensure that the resulting weights are i.i.d. I initially put rnormals there, but then had a gut feeling that this ain't gonna work.

StasK
  • 29,235
  • 2
  • 80
  • 165
  • That doesn't work. Did you try looking at a histogram? – cardinal Aug 09 '11 at 21:53
  • So what should we expect to see on a histogram? I don't know, frankly. The distribution of weights can't be U[0,1]. `hist(w)` is a skewed distribution, but that's inevitable with non-linear transformations of i.i.d.s. – StasK Aug 09 '11 at 21:58
  • You edited your answer since I made that remark. – cardinal Aug 09 '11 at 22:01
  • 4
    Your answer is now almost correct. If you generate $n$ iid $\mathrm{Exp}(1)$ and divide each by the sum, then you will get the correct distribution. See [Dirichlet distribution](http://en.wikipedia.org/wiki/Dirichlet_distribution) for more details, though it doesn't discuss this *explicitly*. – cardinal Aug 09 '11 at 22:02
  • So do you think it's appropriate now? – StasK Aug 09 '11 at 22:02
  • It would be simpler to remove both the extra logarithm and the exponentiation. – whuber Aug 09 '11 at 22:13
  • Well, but the question is, how to reduce the dimensionality by 1. Of course if you start with n i.i.d. Exp(1) variables, you will get identically distributed, but not independent weights. The real question is, what are the $n-1$ variables you'd need to start with to obtain $n$ weights that are identically distributed, although no longer independent. – StasK Aug 09 '11 at 22:16
  • 1
    Given the terminology you are using, you sound a little confused. – cardinal Aug 09 '11 at 22:21
  • 2
    Actually, the Wiki link *does* discuss this (fairly) explicitly. See the second paragraph under the **Support** heading. – cardinal Aug 09 '11 at 22:28
  • I guess I am... now. OK, let's retract steps from the Dirichlet distribution: `y – StasK Aug 09 '11 at 23:26
  • Yet another way to ask this question is: is there a distribution with support $x>0$ such that drawing $n-1$ i.i.d. variables and computing `w – StasK Aug 10 '11 at 00:20
  • 1
    This characterization is both too restrictive and too general. It is too general in that the resulting distribution of $\mathbf{w}$ must be "uniform" on the $n-1$ simplex in $\mathbb{R}^n$. It is too restrictive in that the question is worded generally enough to allow that $\mathbf{w}$ be some function of an $n-1$-variate distribution, which in turn *presumably*, but not necessarily, consists of $n-1$ independent (and *perhaps* iid) variables. – whuber Aug 10 '11 at 02:07
  • My upvote was on the basis of this answer having provoked an interesting discussion. – DWin Jul 18 '18 at 00:30
0

The solution is obvious. The following MathLab code provides the answer for 3 weights.

function [  ] = TESTGEN( )
SZ  = 1000;
V  = zeros (1, 3);
VS = zeros (SZ, 3);
for NIT=1:SZ   
   V(1) = rand (1,1);     % uniform generation on the range 0..1
   V(2) = rand (1,1) * (1 - V(1));
   V(3) = 1 - V(1) - V(2);  
   PERM = randperm (3);    % random permutation of values 1,2,3
   for NID=1:3
         VS (NIT, NID) = V (PERM(NID));
    end
end 
figure;
scatter3 (VS(:, 1), VS(:,2), VS (:,3));
end

enter image description here

user96990
  • 9
  • 1
  • 1
    Your marginals do not have the correct distribution. Judging from the Wikipedia article on the Dirichlet distribution (random number generation section, which has the algorithm you have coded), you should be using a beta(1,2) distribution for V(1), not a uniform[0,1] distribution. – soakley Dec 03 '15 at 18:43
  • 1
    It does appear that the density increases in the corners of this tilted triangle. Nonetheless, it provides a nice geometric display of the problem. – DWin Jul 18 '18 at 00:33