Generate points efficiently between unit circle and unit square

Question

I'd like generate samples from the blue region defined here:

The naive solution is to use rejection sampling in the unit square, but this provides only a $1-\pi/4$ (~21.4%) efficiency.

Is there some way I can sample more efficiently?

Oh like: if the value is (0,0), this can be mapped to (1,1)? I love that idea — Cam.Davidson.Pilon, Jan 26 '17 at 17:59
@cardinal Shouldn't it 4x the efficiency? You can sample in $[0,\ldots,1] \times [0,\ldots,1]$ and then mirror it across x-axis, y-axis and origin. — Martin Krämer, Jan 26 '17 at 20:06
@Martin: Across the four symmetric regions, you have overlap, which you have to deal with more carefully. — cardinal, Jan 26 '17 at 20:24
@cardinal: I may be dense here, but is the overlap not only exactly the x- and y-axis, which together have an area of $0$? — Martin Krämer, Jan 26 '17 at 20:28
I'm not seeing how it is relevant for the efficiency as you hit the overlap area with probability $0$ (ignoring discretization). — Martin Krämer, Jan 26 '17 at 20:33
@Martin: It may be that I'm not fully understanding your suggestion. Note that you can trivially accept points below $y = 1 - \sqrt{1-(1-x)^2}$ and reflect them to double your efficiency since this region is completely disjoint from the OP's. But, if one also wants to consider points "above" $y = \sqrt{1-(1-x)^2}$, then that region intersects with each of the two aforementioned regions. So, if $(x,y)$ falls in one of those intersections (positive area!), one has to handle how to do the reflection for that region appropriately to maintain the equal-area property of the resultant sampling. — cardinal, Jan 26 '17 at 20:36
@cardinal My idea was to sample a point $p = (x,y)$ where we sample $x$ and $y$ from a uniform distribution over $[0,\ldots,1]$. Then we can compute $y_{circle} = \sqrt{1 - x^2}$. If $y_{circle} < y$, $y$ is part of the blue area, if $y_{circle} \geq y$, $y$ is not part of the blue area. Given that $p$ is either part of the blue area or not, we can infer that $p_{mirror\_x} = (-x,y)$, $p_{mirror\_y} = (x,-y)$ and $p_{mirror\_origin} = (-x,-y)$ share the same property, thus 4x efficiency. — Martin Krämer, Jan 26 '17 at 20:46
The points where $x$ or $y$ would be 0, i.e. the axes, would result in lower than 4x efficiency, but as we hit them with probability $0$, it would not change the overall 4x efficiency. That was my train of thought. — Martin Krämer, Jan 26 '17 at 20:54
@Martin: If I'm understanding what you're describing, that doesn't increase the efficiency *at all*. (You found one point, and now know three others---in an area four times the size---that either do or don't lie within the unit disk with probability one according to whether $(x,y)$ does. How does that help?) The point of increasing efficiency is to increase the probability of acceptance for each $(x,y)$ generated. Perhaps I am the one being dense? — cardinal, Jan 26 '17 at 20:54
@cardinal. Ah, ok, I understand it now. Thanks. Yes, you are right. I was talking about how this is 4x more efficient than sampling in $[-1,\ldots,1] \times [-1,\ldots,1]$ as each sample would give 4 points instead of 1. But of course it does not change the probability of the acceptance itself. — Martin Krämer, Jan 26 '17 at 20:58

score 13 · Answer 1 · edited Apr 13 '17 at 12:44

I propose the following solution, that should be simpler, more efficient and/or computationally cheaper than other soutions by @cardinal, @whuber and @stephan-kolassa so far.

It involves the following simple steps:

1) Draw two standard uniform samples: $$ u_1 \sim Unif(0,1)\\ u_2 \sim Unif(0,1). $$

2a) Apply the following shear transformation to the point $\min\{u_1,u_2\}, \max\{u_1,u_2\}$ (points in the lower right triangle are reflected to the upper left triangle and they will be "un-reflected" in 2b): $$ \begin{bmatrix} x\\y \end{bmatrix} = \begin{bmatrix} 1\\1 \end{bmatrix} + \begin{bmatrix} \frac{\sqrt{2}}{2} & -1\\ \frac{\sqrt{2}}{2} - 1 & 0\\ \end{bmatrix} \, \begin{bmatrix} \min\{u_1,u_2\}\\ \max\{u_1,u_2\}\\ \end{bmatrix}. $$

2b) Swap $x$ and $y$ if $u_1 > u_2$.

3) Reject the sample if inside the unit circle (acceptance should be around 72%), i.e.: $$ x^2 + y^2 < 1. $$

The intuition behind this algorithm is shown in the figure.

Steps 2a and 2b can be merged into a single step:

2) Apply shear transformation and swap $$ x = 1 + \frac{\sqrt{2}}{2} \min(u_1, u_2) - u_2\\ y = 1 + \frac{\sqrt{2}}{2} \min(u_1, u_2) - u_1 $$

The following code implements the algorithm above (and tests it using @whuber's code).

n.sim <- 1e6
x.time <- system.time({
    # Draw two standard uniform samples
    u_1 <- runif(n.sim)
    u_2 <- runif(n.sim)
    # Apply shear transformation and swap
    tmp <- 1 + sqrt(2)/2 * pmin(u_1, u_2)
    x <- tmp - u_2
    y <- tmp - u_1
    # Reject if inside circle
    accept <- x^2 + y^2 > 1
    x <- x[accept]
    y <- y[accept]
    n <- length(x)
})
message(signif(x.time[3] * 1e6/n, 2), " seconds per million points.")
#
# Plot the result to confirm.
#
plot(c(0,1), c(0,1), type="n", bty="n", asp=1, xlab="x", ylab="y")
rect(-1, -1, 1, 1, col="White", border="#00000040")
m <- sample.int(n, min(n, 1e4))
points(x[m],y[m], pch=19, cex=1/2, col="#0000e010")

Some quick tests yield the following results.

Algorithm https://stats.stackexchange.com/a/258349 . Best of 3: 0.33 seconds per million points.

This algorithm. Best of 3: 0.18 seconds per million points.

+1 Very well done! Thank you for sharing a thoughtful, clever, and simple solution. — whuber, Jan 27 '17 at 14:56
Great idea! I was thinking about a mapping from the unit sq to this portion, but didn't think of an _imperfect_ mapping and then a rejection scheme. Thanks for expanding my mind! — Cam.Davidson.Pilon, Jan 27 '17 at 18:37

whuber · Accepted Answer · 2017-01-27T14:50:41.047

10

Will two million points per second do?

The distribution is symmetric: we only need work out the distribution for one-eighth of the full circle and then copy it around the other octants. In polar coordinates $(r,\theta)$, the cumulative distribution of the angle $\Theta$ for the random location $(X,Y)$ at the value $\theta$ is given by the area between the triangle $(0,0), (1,0), (1,\tan\theta)$ and the arc of the circle extending from $(1,0)$ to $(\cos\theta,\sin\theta)$. It is thereby proportional to

$$F_\Theta(\theta) = \Pr(\Theta \le \theta) \propto \frac{1}{2}\tan(\theta) - \frac{\theta}{2},$$

whence its density is

$$f_\Theta(\theta) = \frac{d}{d\theta} F_\Theta(\theta) \propto \tan^2(\theta).$$

We may sample from this density using, say, a rejection method (which has efficiency $8/\pi-2 \approx 54.6479\%$).

The conditional density of the radial coordinate $R$ is proportional to $rdr$ between $r=1$ and $r=\sec\theta$. That can be sampled with an easy inversion of the CDF.

If we generate independent samples $(r_i,\theta_i)$, conversion back to Cartesian coordinates $(x_i,y_i)$ samples this octant. Because the samples are independent, randomly swapping the coordinates produces an independent random sample from the first quadrant, as desired. (The random swaps require generating only a single Binomial variable to determine how many of the realizations to swap.)

Each such realization of $(X,Y)$ requires, on average, one uniform variate (for $R$) plus $1/(8\pi-2)$ times two uniform variates (for $\Theta$) and a small amount of (fast) calculation. That's $4/(\pi-4) \approx 4.66$ variates per point (which, of course, has two coordinates). Full details are in the code example below. This figure plots 10,000 out of more than a half million points generated.

Here is the R code that produced this simulation and timed it.

n.sim <- 1e6
x.time <- system.time({
  # Generate trial angles `theta`
  theta <- sqrt(runif(n.sim)) * pi/4
  # Rejection step.
  theta <- theta[runif(n.sim) * 4 * theta <= pi * tan(theta)^2]
  # Generate radial coordinates `r`.
  n <- length(theta)
  r <- sqrt(1 + runif(n) * tan(theta)^2)
  # Convert to Cartesian coordinates.
  # (The products will generate a full circle)
  x <- r * cos(theta) #* c(1,1,-1,-1)
  y <- r * sin(theta) #* c(1,-1,1,-1)
  # Swap approximately half the coordinates.
  k <- rbinom(1, n, 1/2)
  if (k > 0) {
    z <- y[1:k]
    y[1:k] <- x[1:k]
    x[1:k] <- z
  }
})
message(signif(x.time[3] * 1e6/n, 2), " seconds per million points.")
#
# Plot the result to confirm.
#
plot(c(0,1), c(0,1), type="n", bty="n", asp=1, xlab="x", ylab="y")
rect(-1, -1, 1, 1, col="White", border="#00000040")
m <- sample.int(n, min(n, 1e4))
points(x[m],y[m], pch=19, cex=1/2, col="#0000e010")

edited Jan 27 '17 at 14:50

answered Jan 26 '17 at 18:57

whuber

281,159
54
637
1,101

Whuber! This is a great answer, thank you. Very impressive thought process. – Cam.Davidson.Pilon Jan 26 '17 at 19:19
1

I don't understand this sentence: "Because the samples are independent, systematically swapping the coordinates every second sample produces an independent random sample from the first quadrant, as desired." It seems to me that systematically swapping the coordinates every second sample produces highly dependent samples. For example, it seems to me that your implementation in code generates half a million samples in a row from the same octant? – A. Rex Jan 26 '17 at 20:24
7

Strictly speaking, this approach doesn't quite work (for iid points) since it generates an identical number of samples in the two octants: The sample points are, thus, dependent. Now, if you flip unbiased coins to determine the octant for each sample... – cardinal Jan 26 '17 at 20:25
1

@Cardinal you are correct; I'll fix that--without (asymptotically) increasing the number of random variates to generate! – whuber Jan 26 '17 at 21:13
@whuber: Yes, in theory, a single additional uniform random variate is all that is needed. :-) – cardinal Jan 26 '17 at 21:35
2

Strictly speaking (and, again, only in the purest theoretical sense), in the finite sample case, your modification requires *no* additional uniform random variates. To wit: From the very first uniform random variate, construct the flipping sequence from the first $n$ bits. Then, use the remainder (times $2^n$) as the first coordinate generated. – cardinal Jan 26 '17 at 21:51
@A.Rex As I stated, *because the samples are independent*, it doesn't matter which of them we flip. We just have to make sure to flip the correct (random) number of them. – whuber Jan 26 '17 at 23:27
@whuber: My complaint was the same as cardinal's. Your fixes have mostly addressed the issue. I do think it'd be nicer if the generated points were an iid sequence (cardinal seems to suggest the same thing, implicit in ".. determine the octant for each sample" as well as "flipping sequence from the first n bits"), but I guess you're just aiming for an iid collection/set. – A. Rex Jan 27 '17 at 04:58
@A.Rex It's easy enough to get an iid sequence, but at a price: you have to generate a new Bernoulli variate for each new point. – whuber Jan 27 '17 at 06:04
What about using the inverse cdf for generating $\theta$? – Xi'an Jan 27 '17 at 10:08
2

@Xi'an I was unable to obtain a conveniently computable inverse. I can do slightly better by rejection sampling from the distribution with density proportional to $2\sin(\theta)^2$ (the efficiency is $(4-\pi)/(\pi-2)\approx 75\%$), at the cost of having to compute an arcsine. – whuber Jan 27 '17 at 14:54

score 7 · Answer 3 · answered Jan 26 '17 at 16:42

Well, more efficiently can be done, but I sure hope you are not looking for faster.

The idea would be to sample an $x$ value first, with a density proportional to the length of the vertical blue slice above each $x$ value:

$$ f(x) = 1-\sqrt{1-x^2}. $$

Wolfram helps you to integrate that:

$$ \int_0^x f(y)dy = -\frac{1}{2}x\sqrt{1-x^2}+x-\frac{1}{2}\arcsin x.$$

So the cumulative distribution function $F$ would be this expression, scaled to integrate to 1 (i.e., divided by $ \int_0^1 f(y)dy$).

Now, to generate your $x$ value, pick a random number $t$, uniformly distributed between $0$ and $1$. Then find $x$ such that $F(x)=t$. That is, we need to invert the CDF (inverse transform sampling). This can be done, but it's not easy. Nor fast.

Finally, given $x$, pick a random $y$ that is uniformly distributed between $\sqrt{1-x^2}$ and $1$.

Below is R code. Note that I am pre-evaluating the CDF at a grid of $x$ values, and even then this takes quite a few minutes.

You can probably speed the CDF inversion up quite a bit if you invest some thinking. Then again, thinking hurts. I personally would go for rejection sampling, which is faster and far less error-prone, unless I had very good reasons not to.

epsilon <- 1e-6
xx <- seq(0,1,by=epsilon)
x.cdf <- function(x) x-(x*sqrt(1-x^2)+asin(x))/2
xx.cdf <- x.cdf(xx)/x.cdf(1)

nn <- 1e4
rr <- matrix(nrow=nn,ncol=2)
set.seed(1)
pb <- winProgressBar(max=nn)
for ( ii in 1:nn ) {
    setWinProgressBar(pb,ii,paste(ii,"of",nn))
    x <- max(xx[xx.cdf<runif(1)])
    y <- runif(1,sqrt(1-x^2),1)
    rr[ii,] <- c(x,y)
}
close(pb)

plot(rr,pch=19,cex=.3,xlab="",ylab="")

I wonder if using Chebyshev polynomials to approximate the CDF would improve the evaluation speed. — Sycorax, Jan 26 '17 at 16:45
@Sycorax, not without modifications; see e.g. [the chebfun treatment](http://www.chebfun.org/docs/guide/guide09.html) of algebraic singularities at the endpoints. — J. M. is not a statistician, Jan 27 '17 at 14:58

Generate points efficiently between unit circle and unit square

3 Answers3

Linked