Efficient generation of chi random variables

Question

I need to generate random variables generated from a chi distribution (not chi-squared!). There doesn't seem to be standard mechanism in C++ in (for example) Boost::Random and hence I am looking for an alternative implementation implemented in C++ (or an easily compatible language such as C or FORTRAN).

There is one paper in ACM TOMS on the topic. Or, should I simply take the square root of a chi-squared random variable, of which there are several implementations?

Which is likely to be fastest - unfortunately, I need to draw a lot of them.

Update:

In response to the below comments, these are some notes for my specific application:

I am looking for a C/C++ compatible implementation, or sufficient detail to program one myself.
The application is CPU sensitive, and is run on a modest embedded platform (400MHz with a scalar double FPU and limited cache, but no sqrt(), sin(), exp() or similar instructions). Since actual run-time is not easy to predict a priori, I am looking for algorithms that are amongst the fastest and known to perform well. I can evaluate each of these on my own hardware.
I already have facilities for generating uniform and normal random numbers. I can get chi-squared if necessary.
DoF is always an integer, usually between 15 and 39.

The answer I accept will have the following content:

Which methods are considered "fast", and how are they implemented?
Do they otherwise have known disadvantages (e.g. numerical stability)?
Are there any methods that should be specifically avoided?

Moderators: can I suggest a new tag of chi or chi-distribution? I don't yet have enough reputation to create it. — Damien, Aug 12 '13 at 00:41
Do you need variable degrees of freedom, or just one particular one? If it was one particular one, something like a carefully implemented ziggurat like method might be about as fast as you'd get. (How many is a lot?) — Glen_b, Aug 12 '13 at 01:51
@Glen_b, multiple degrees of freedom. I've actually misread the number that I need... I only need in the order of thousands per second, albeit on an embedded system. Nevertheless, if the internal implementation avoid transcendental functions, that would be preferred. — Damien, Aug 12 '13 at 06:30
I just did it the "slow" way in R - which is already regarded as being slow. That is, (i) I started with n randomly generated degrees of freedom (so there's no possible speedup in adapting to a specific distribution), and then (ii) generated n chisquared random variables each with its own previously obtained df, then took the square root. For n = 10^5, it did the second stage (generating chi-squares taking square roots & assigning the result) in <0.1s on my slow little laptop. You should be able to do much better (maybe 10-20x?) on a better machine in C++ even doing it the 'dumb-but-easy' way. — Glen_b, Aug 12 '13 at 08:03
That is, I'd be astonished if you couldn't do 10^6 chi random variables the dumb way in well under a second, or 10^4 in well under a hundredth of a second. Why would you bother with being "smart" in this case? — Glen_b, Aug 12 '13 at 08:08
The target is a 400Mhz processor with a basic CPU and Limited cache. Unlike x86 it does not have hardware sin cos or exp. Suffice to say that even 0.5% is valuable to me and thus the original question stands. — Damien, Aug 12 '13 at 11:52
You should probably add those conditions to your Q, and then since you want not just 'reasonably fast', but 'the fastest'... (but without a full specification of the relative speeds of all the operations), so my response would be 'uh, that's a decent research problem, have you got say 50K and a few months?' :) — Glen_b, Aug 12 '13 at 13:19
My general advice would be to write special code to handle small df (especially if its integer df), and then consider whether you can beat whichever is faster out of a ratio-of-uniforms approach or just using an asymmetric table mountain proposal for an accept-reject. — Glen_b, Aug 12 '13 at 13:19
I have updated the scope of the question to reflect the comments in this discussion. — Damien, Aug 12 '13 at 23:38
"*Which methods are considered "fast", and how are they implemented?*" -- sampling chi-distributions is pretty esoteric, and when you add in your restrictions, there's unlikely to be *any* consensus about what's 'fast'. You're assuming a literature that doesn't exist. We can make suggestions about what should be good approaches in general, but as for what's fast on your machine, as I have already explained, that likely depends on things we don't know, like the relative speed with which you *can* implement various calculations. — Glen_b, Aug 13 '13 at 00:01

mrdmnd · Accepted Answer · 2013-08-14T07:57:06.190

Here's a pure C implementation that generates a little over three million samples per second (using df=20) on my machine (Macbook Pro i7 @ 2.7 GHz).

I've assumed the only mathematical functions supported natively are multiplication, division, addition, subtraction, and the rand() function from the standard library. You can easily replace the rand() function with something equivalent - it seems to be the bottleneck in this program after profiling. I've avoided linking to math.h to avoid pulling in the extra code.

#include <time.h>
#include <stdio.h>
#include <stdlib.h>
const int HALF_RAND_MAX = RAND_MAX/2;
const float c1 = -3.0/2;
const float c2 = 11.0/6;
const float c3 = -25.0/12;
const float c4 = 137.0/60;
const float c5 = -49.0/20;

// Reuse of the other squared random normal.
static int hasSpare = 0;
static float spare = 0.0;

float randu() {
  // Generates a random float uniformly between [-1.0, 1.0) using RAND_MAX
  return ((float)rand())/(HALF_RAND_MAX) - 1;
}

inline float lnxinvx(float x) {
  // return approximation for ln(x) / x
  float v = x - 1;
  return v + v*(c1 + v*(c2 + v*(c3 + v*(c4 + v*c5))));
}
float fast_sqrt(float x) {
  // Adapted from Carmac's famous Q_sqrt trick.
  int i;
  float xhalf = x*0.5f;
  i = *(int*)&x;
  i = 0x5f3759df - (i >> 1);
  x = *(float*)&i;
  x = x*(1.5f - (xhalf * x * x));
  x = x*(1.5f - (xhalf * x * x));
  return 1.0f/x;
}

float squared_random_normal() {
  // Generate a squared random normal variable, or use the spare one.
  if (hasSpare) {
    hasSpare = 0;
    return spare;
  } else {
    float u = randu();
    float v = randu();
    float s = u*u + v*v;
    while (s >= 1) {
      u = randu();
      v = randu();
      s = u*u + v*v;
    }
    float mul = -2.0*lnxinvx(s);
    spare = v * v * mul;
    hasSpare = 1;
    return u * u * mul;
  }
}

float rand_chi(unsigned int df) {
  // Add up "df" squared random normals and take the square root.
  float acc = 0;
  int i = 0;
  for(i; i < df; ++i) {
    acc += squared_random_normal();
  }
  return fast_sqrt(acc);
}

int main(char* argc, char** argv) {
  srand(time(NULL));
  int i;
  int df = atoi(argv[1]);
  int n = atoi(argv[2]);
  printf("Using %i degrees of freedom\n", df);
  printf("Generating %i samples\n", n);
  for(i = 0; i < n; ++i) {
    //printf("Next variate: %f\n", rand_chi(df));
    rand_chi(df);
  }
}

And here's the timing information, for 20 degrees of freedom.

Wed Aug 14 00:08:04 ~/Dropbox/RandomCodeSnippets/chidist $ gcc -O3 -funroll-loops -fwhole-program chidist.c -o chidist
Wed Aug 14 00:08:09 ~/Dropbox/RandomCodeSnippets/chidist $ time ./chidist 20 10000000
Using 20 degrees of freedom
Generating 10000000 samples

real    0m3.208s
user    0m3.205s
sys     0m0.002s

Hopefully this is roughly what you were looking for (or is roughly fast enough).

Thanks!

Interesting implementation as a sum of normals. I would replace the series in lnxinvx by its [Horner form](http://reference.wolfram.com/mathematica/ref/HornerForm.html) for stability, but it looks quite worthwhile. — Damien, Aug 14 '13 at 05:10
I've made some other optimizations that get about 50% improvement. Updating the post now for more details - clever thought about the horner form, too! — mrdmnd, Aug 14 '13 at 06:41

score 0 · Answer 2 · answered Aug 12 '13 at 04:26

0

To generate Chi distribution random variate in Mathematica, general expression is

RandomVariate[ChiDistribution[degree of freedom], number of random variates]

For example we have to generate five random variates with three degree of freedom the code is

RandomVariate[ChiDistribution[3],5]

The output is

{0.753102, 1.6647, 1.25129, 0.456877, 0.632508}

answered Aug 12 '13 at 04:26

SAAN

531
5
16

@Glen_b Than `C++` tag should be added in question. I read "an alternative implementation". – SAAN Aug 12 '13 at 04:43
Good points; OP should at least clarify whether an algorithm he could implement in C++ is required or if he's looking for something other than C++, which I agree is a possible way to read the post. – Glen_b Aug 12 '13 at 05:16
Agreed, clarified - target implementation is for C++ – Damien Aug 12 '13 at 06:30

score 0 · Answer 3 · answered Aug 13 '13 at 00:03

0

I haven't yet used RcppArmadillo package but you may look into it. This package is useful if C++ has been decided as the language of choice.

answered Aug 13 '13 at 00:03

Metrics

2,526
2
19
31

Efficient generation of chi random variables

3 Answers3