3

I would like to create a random data sample to test a clustering algorithm using python. One specific data sample I would like to generate consists of two non-overlapping annuluses (annuli?); however, it's okay if the inner-most annuli is a circle - as in the image below (taken from this question).

example data

I realize that I can sample N values of angles ti such that 0 ≤ ti ≤ 2 pi. Based on some reading online, I think one can generate N samples of circular data by specifying a fixed radius r such that xi = r cos(ti) and yi = r sin(ti). Applying this reasoning to the desired annulus, I am thinking one can repeat the process for variable radii rmin and rmax; however, I am not sure how to proceed from this point. My thinking is to generate n samples of radii ri (where n < N, the annulus cluster is a subset of all the data) such that rmin ≤ ri ≤ rmax; doing so necessarily means the first cluster should be the n points within the the outer-most annulus and the second cluster should be the N-n points within the inner-most annulus/circle.

So my questions are as follows:

1) Is my approach so far reasonable? Are there (dis)advantages to this approach, such as clustering more near centers or borders?

2) If my approach is wrong, why is it wrong and what would be a better approach?

EDIT:

The original post that serves as a potential duplicate actually shows how to sample points within a circle, not an annulus. I am concerned with the case of an annulus.

Stephan Kolassa
  • 95,027
  • 13
  • 197
  • 357
  • I'm a bit confused. You write of "two non-overlapping disks", but the innermost disk can be a "circle". Circles are one-dimensional. Your actual picture shows an [annulus](https://en.wikipedia.org/wiki/Annulus_(mathematics)) which is concentric with a disk. If this is what you actually want, please edit your post to clarify. Thank you! – Stephan Kolassa May 05 '19 at 03:51
  • That was bad terminology on my part. I edited the post to reflect that I would like to sample points within an annulus (not a disk). –  May 05 '19 at 03:57
  • 2
    Your approach will suffer from the problem the poster in the proposed duplicate pointed out, with a higher density near the center than near the outer boundary. This will afflict both the annulus and the disk, but more so the disk. [whuber's answer](https://stats.stackexchange.com/a/120535/1352) in the proposed duplicate directly gives you the disk part, and by constraining the part where `rho` is generated, you get the annulus. – Stephan Kolassa May 05 '19 at 03:59
  • Is my approach of varying the radius between `rmin` and `rmax` the proper way to "constrain the part where `rho` is generated"? –  May 05 '19 at 04:01
  • Do you for wish the density on each annalus to be uniform? – Sycorax May 05 '19 at 04:06
  • @Sycorax The density does not need to be uniform. –  May 05 '19 at 04:08
  • 1
    Even you do not care the density, but for simulation, you still need to specify the density distribution. – user158565 May 05 '19 at 04:27
  • @allthemikeysaretaken In that case, your procedure will produce points that lie on the annuluses. Stephan's answer works too, with the further quality that the points have uniform density. – Sycorax May 05 '19 at 04:30
  • @Sycorax I looked at the solution you provided as well, just not enough points to upvote. Thanks for the suggestion, came in handy! –  May 05 '19 at 04:33
  • I added an answer to the duplicate question that can be extended to more than two dimensions, here https://stats.stackexchange.com/a/406914/36041 – Aksakal May 06 '19 at 19:57

1 Answers1

3

Your approach will suffer from the problem the poster in the proposed duplicate pointed out, with a higher density near the center than near the outer boundary. This will afflict both the annulus and the disk, but more so the disk.

whuber's answer in the proposed duplicate directly gives you the disk part, and by constraining the part where rho is generated, you get the annulus. Note that you need to constrain "on the square root scale".

Here is an adoption of whuber's original R code, which should be easy to translate to Python:

outer_radius <- 1
inner_radius <- 0.7
n <- 1e4
rho <- sqrt(runif(n,inner_radius^2,outer_radius^2))
theta <- runif(n, 0, 2*pi)
x <- rho * cos(theta)
y <- rho * sin(theta)
plot(x, y, pch=19, cex=0.6, col="#00000020")

annulus

Stephan Kolassa
  • 95,027
  • 13
  • 197
  • 357