17

I have this problem: I'm trying to sample the relation $$ \sum_{i=1}^N x_i = 1 $$ in the domain where $x_i>0\ \forall i$. Right now I'm just extracting $N$ random numbers $u_i$ from a uniform distribution $[0,1]$ and then I transform them into $x_i$ by using $$ x_i = \frac{u_i}{\sum_{i=1}^N{u_i}}. $$

This is correct but requires me three $u_i$ to have a sample of the relation.

If $N=2$, then the relation simplify into $x_1 + x_2 = 1$ and I can easily extract one $u_1 = x_1$ and evaluate the other as $x_2 = 1 - x_1$, therefore using only one extraction. But if I do the same with $N=3$, I can have $u_1 = 0.8$, $u_2 = 0.7$ and I can't just use the first relation to evaluate the $x_i$ and I cannot combine it with the second one, since I don't know the $\sum u_i$.

How can I sample the $x_i$ that satisfies the first relation while only picking $N-1$ values from a random distribution (preferably the uniform one)? And what kind of mathematical problem is this?

PS I'm not a mathematician and this is my first question here. I've searched for an answer but I think that I haven't already well defined the problem so I don't really know what to search. I feel the solution might be close to this but I can't figure out how. So I would appreciate any help on better defining the problem from a mathematical point of view, as well on finding its solution (of course xD)

PS2 also a help to better choose the tags and the title would be appreciated

LordM'zn
  • 171
  • 1
  • 7
  • A good mathematical problem. Practically speaking, the processing you are doing on the $x_i$ probably far outweighs the time to generate them, so generating one extra is very small overhead. – Ross Millikan Sep 23 '13 at 17:14
  • You're right about the time, but the $x_i$ are involved in an optimization, so having one more variable to be optimized makes the problem longer and/or the solution worse. – LordM'zn Sep 24 '13 at 09:09

2 Answers2

13

Mathematically, you are trying to uniformly sample points on a simplex, which in turn is also equivalent to sampling points from a Dirichlet distribution and then normalizing.

Here's how to see all this: take the unit interval and pick uniformly at random and independently $N-1$ points. These points partition the unit line into segments, so take your $x_i$ to equal the lengths of the segments. Basically, to implement this, you draw the $N-1$ points at random and then you sort them in increasing order, which gives you a kind of Beta distribution.

If you need to program this, here's one efficient way of doing it without sorting:

1) Generate $N$ independent points $E_i$ from a an exponential distribution by drawing $U_i$ from a uniform distribution on $[0,1]$ and then compute the negative logorithm: $E_i:=-\log(U_i)$.

2) Sum all samples to get $S:=\sum_{i=1}^{N}E_i$.

3) The vector $x:=(x_1,\ldots,x_{N})$, where $x_i=E_i/S$ is uniformly distributed in your space.

Alex R.
  • 32,263
  • 1
  • 37
  • 76
  • Thank you, the geometric explanation really cleared my mind. The only problem is the need for sorting, which I fear can be a problem for the application I'm facing - so I'd prefer to use the second method, but I don't understand it completely- – LordM'zn Sep 24 '13 at 09:26
  • I've tried it in Matlab, and the $x_i$ generated doesn't have $\sum x_i = 1$. Also I'm still generating $N$ numbers, not $N-1$. I guess I'm missing something. Here's the code: `c = zeros(1,1000000); for i=1:1000000 u = rand(1,3); E = -log(u); x = E./s; c(i) = sum(x); end; mean(c) = 3.0018e-07 ` Sorry for the double comments, but as I've understood how to format the code, then I discovered the comments can be modified only for 5 minutes xD – LordM'zn Sep 24 '13 at 09:33
  • @LordM'zn: something seems very off in your code. You compute the $E$ just fine but then you divide $x$ by $s$ when $s$ hasn't been computed? You also seem to be updating $c$ at every step and it's unclear what you intend to compute with it. Moreover, since you initialized your $c$ to have 1000000 zeros, mean(c) will still be using those extra 0's. Notice that the algorithm does everything at once. You generate the $N$ numbers: the point is that we take care of the constraint that they all add to 1 at the very end. – Alex R. Sep 24 '13 at 16:31
  • you're right about $s$ not computed. I've added `s = sum(E);` before `x = E./s` and now it works: the purpose of $c$ and the `for` cycle is to repeat the experiment multiple times to see if something goes wrong. Even after 1 million of experiment (3 million of extraction) the mean of $c$ is perfectly one. So thank you for this. But with this method, there's no need for sorting, but I still have to generate $N$ points, not $N-1$ as in segments method. This is the only thing the prevent me to accept your answer (which I've upvoted nonetheless ;) – LordM'zn Sep 25 '13 at 08:11
  • @lordM'zn: I'm confused: the segment lengths correspond exactly to the $x_i$. It's precisely as if you are uniformly randomly picking points $x_i>0$ with $\sum_i x_i=1$. – Alex R. Sep 25 '13 at 16:10
  • Yes indeed, but my issue is how many points I have to pick from the random distribution. I've simplified the previous code and added the code for the sorting method: http://www.wepaste.com/simplexSampling/ As you can see, in the first method I pick 3 numbers with `rand(1,3)`. In the second, I pick just 2 of them with `rand(1,2)`. What I would like to have is a method that keeps picking only two numbers and doesn't need the sorting. PS I'm sorry for the confusion, I find it's very difficult to communicate with someone who has a different scientific background – LordM'zn Sep 26 '13 at 08:06
4

There is also an analytic expression you can use, to map uniformly distributed points inside a $d$ dimensional unit hypercube, to uniformly distributed points inside a simplex of the same dimension with nodes $[\boldsymbol\xi_0, \boldsymbol\xi_1,\cdots,\boldsymbol\xi_{d+1}]$, where $\boldsymbol\xi_i\in\mathbb{R}^{d}$:

\begin{align} M_{d}(r_1, \cdots,r_d) = \boldsymbol\xi_0 + \sum_{i=1}^{d}\prod_{j=1}^i r_{d-j+1}^{\frac{1}{{d-j+1}}}(\boldsymbol\xi_i - \boldsymbol\xi_{i-1}), \end{align} The $r_{i}$ are the points inside the unit hypercube and are distributed as $\mathcal{U}[0,1]$, and the points inside the simplex (i.e. $M_d\in\mathbb{R}^d$), are also distributed uniformly. This formula is derived in an appendix of this article: https://doi.org/10.1016/j.jcp.2015.12.034

When I derived this formula I wasn't thinking of sampling on the (unit) simplex, but with a bit of experimenting, I think this is still possible by simply setting $r_d=1$, and of course by setting the nodes $\boldsymbol\xi_i$ to those of the unit simplex.

Below you'll find a Python implementation for sampling inside or on the (unit) simplex with a 2D, 3D, 4D and 100D example:

import numpy as np
import matplotlib.pyplot as plt

def sample_simplex(xi, n_mc=1, on_simplex=False):
    """
    Use an analytical function map to points in the d-dimensional unit hypercube to a
    d-dimensional simplex with nodes xi.

    Parameters
    ----------
    xi: array of floats, shape (d + 1, d)
        The nodes of the d dimensional simplex.

    n_mc : int, The default is 1.
        Number of samples to draw from inside the simplex

    on_simplex: boolean, default is False
        If True, sample on the simplex rather than inside it

    Returns: array, shape (n_mc, d)
    -------
    n_mc uniformly distributed points inside the d-dimensional simplex with edges xi.

    """
    d = xi.shape[1]
    P = np.zeros([n_mc, d])
    for k in range(n_mc):
        # random points inside the unit hypercube
        r = np.random.rand(d)
        # sample on, instead of inside the simplex (all samples will sum to one)
        if on_simplex:
            r[-1] = 1

        # the term of the map is \xi_k_j0
        sample = np.copy(xi[0])
        for i in range(1, d + 1):
            prod_r = 1.
            # compute the product of r-terms: prod(r_{d-j+1}^{1/(d-j+1)})
            for j in range(1, i + 1):
                prod_r *= r[d - j]**(1. / (d - j + 1))
            # compute the ith term of the sum: prod_r*(\xi_i-\xi_{i-1})
            sample += prod_r * (xi[i] - xi[i - 1])
        P[k, :] = sample

    return P

plt.close('all')

# triangle points
xi = np.array([[0.0, 0.0], [1.0, 0.0], [0, 1]])
samples = sample_simplex(xi, 1000, on_simplex=True)
fig = plt.figure()
ax = fig.add_subplot(111)
# plot random samples
ax.plot(samples[:,0], samples[:, 1], '.')
# plot edges
ax.plot(xi[:, 0], xi[:, 1], 'o')
plt.tight_layout()

# 3D simplex
xi = np.array([[0.0, 0.0, 0.0], [1.0, 0.0, 0.0], [0, 1, 0], [0,0,1]])
samples = sample_simplex(xi, 1000, on_simplex=True)
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
# plot random samples
ax.scatter(samples[:,0], samples[:, 1], samples[:,2], '.')
# plot edges
ax.scatter(xi[:, 0], xi[:, 1], xi[:, 2], 'o')
plt.tight_layout()

# 4D simplex
xi = np.array([[0,0,0,0], [1.0, 0.0, 0.0, 0], [0, 1, 0, 0], [0,0,1,0], [0,0,0,1]])
samples = sample_simplex(xi, 1000, on_simplex=True)
# These should all sum to one
print(np.sum(samples, axis=1))

# 100D simplex
xi = np.zeros([1, 100])
xi = np.append(xi, np.eye(100), axis=0)
samples = sample_simplex(xi, 1000, on_simplex=True)
# These should all sum to one
print(np.sum(samples, axis=1))

plt.show()