20

I am looking for a simple way to sample from a multivariate von Mises-Fisher distribution in Python. I have looked in the stats module in scipy and the numpy module but only found the univariate von Mises distribution. Is there any code available? I have not found yet.

Apparently, Wood (1994) has designed an algorithm for sampling from the vMF distribution according to this link, but I can't find the paper.

-- edit For precision, I am interested by the algorithm which is hard to find in the literature (most of the papers focus on $S^2$). The seminal article (Wood, 1994) cannot be found for free, to my knowledge.

Stephan Kolassa
  • 95,027
  • 13
  • 197
  • 357
mic
  • 3,848
  • 3
  • 23
  • 38
  • 1
    The input to `scipy.stats.vonmises` can be array like, so you can specify the distribution as an `array`. See this [example](http://nbviewer.ipython.org/gist/saketkc/036a5c5ce8c17847c998) – rightskewed Jun 14 '15 at 04:52
  • Thanks for your answer. But, it seems that it is more a product of 1-D von Mises than a real n-D von Mises-Fisher: `K = vonmises.pdf([x,x], kappa=[[1],[10]])`. A 2-D vMF should have only one real $\kappa$ as parameter. Do you agree? – mic Jun 15 '15 at 09:48
  • I am looking for the algorithm VM* originally in Simulation of the von Mises Fisher distribution (Wood, 1994). Anyone? – mic Jun 15 '15 at 12:46
  • 3
    I found the answers in this thread here really useful. I've provided a slightly cleaned up utility function to do this as part of this package: [https://github.com/clara-labs/spherecluster/blob/develop/spherecluster/util.py](https://github.com/clara-labs/spherecluster/blob/develop/spherecluster/util.py), for those still looking to generate this data. – Jaska Aug 11 '16 at 15:53

2 Answers2

14

Finally, I got it. Here is my answer.

I finally put my hands on Directional Statistics (Mardia and Jupp, 1999) and on the Ulrich-Wood's algorithm for sampling. I post here what I understood from it, i.e. my code (in Python).

The rejection sampling scheme:

def rW(n, kappa, m):
    dim = m-1
    b = dim / (np.sqrt(4*kappa*kappa + dim*dim) + 2*kappa)
    x = (1-b) / (1+b)
    c = kappa*x + dim*np.log(1-x*x)

    y = []
    for i in range(0,n):
        done = False
        while not done:
            z = sc.stats.beta.rvs(dim/2,dim/2)
            w = (1 - (1+b)*z) / (1 - (1-b)*z)
            u = sc.stats.uniform.rvs()
            if kappa*w + dim*np.log(1-x*w) - c >= np.log(u):
                done = True
        y.append(w)
    return y

Then, the desired sampling is $v \sqrt{1-w^2} + w \mu$, where $w$ is the result from the rejection sampling scheme, and $v$ is uniformly sampled over the hypersphere.

def rvMF(n,theta):
    dim = len(theta)
    kappa = np.linalg.norm(theta)
    mu = theta / kappa

    result = []
    for sample in range(0,n):
        w = rW(n, kappa, dim)
        v = np.random.randn(dim)
        v = v / np.linalg.norm(v)

        result.append(np.sqrt(1-w**2)*v + w*mu)

    return result

And, for effectively sampling with this code, here is an example:

import numpy as np
import scipy as sc
import scipy.stats

n = 10
kappa = 100000
direction = np.array([1,-1,1])
direction = direction / np.linalg.norm(direction)

res_sampling = rvMF(n, kappa * direction)
jazzwhiz
  • 123
  • 4
mic
  • 3,848
  • 3
  • 23
  • 38
  • 4
    (+1) Thank you for sharing your answer (especially despite the potential discouragement of having your question initially closed)! – whuber Jun 16 '15 at 14:18
6

(I apologize for the formatting here, I created an account just to reply to this question, since I was also trying to figure this out recently).

The answer of mic isn't quite right, the vector $v$ needs to come from $S^{p-2}$ in the tangent space to $\mu$, that is, $v$ should be a unit vector orthogonal to $\mu$. Otherwise, the vector $v\sqrt{1-w^2}+w\mu$ will not have norm one. You can see this in the example provided by mic. To fix this, use something like:

import scipy.linalg as la
def sample_tangent_unit(mu):
    mat = np.matrix(mu)

    if mat.shape[1]>mat.shape[0]:
        mat = mat.T

    U,_,_ = la.svd(mat)
    nu = np.matrix(np.random.randn(mat.shape[0])).T
    x = np.dot(U[:,1:],nu[1:,:])
    return x/la.norm(x)

and replace

v = np.random.randn(dim)
v = v / np.linalg.norm(v)

in mic's example with a call to

v = sample_tangent_unit(mu)
Kevin
  • 61
  • 2