How to sample from Cantor distribution?

Question

What would be the best way to sample from Cantor distribution? It only has cdf and we can't invert it.

Actually, someone asked it on Mathematics : http://math.stackexchange.com/questions/1115907/can-you-simulate-from-a-cantor-distribution — RUser4512, Aug 12 '16 at 15:25
Here are some interesting follow-up questions: what is the standard deviation? What is the moment-generating function? How do they compare to their counterparts for the Uniform$(0,1)$ distribution? :-) — whuber, Aug 12 '16 at 16:20
I like the infinite loop you guys have created by referencing the math.stackexchange post, which links back here :p — Tasos Papastylianou, Aug 13 '16 at 02:12

whuber · Accepted Answer · 2016-08-12T15:42:10.283

25

Easy: sample from a Uniform$(0,1)$ distribution and recode from binary to ternary, interpreting each "1" as a "2". (This is the inverse probability transform approach: it does indeed invert the CDF!)

Here is an R implementation, written in a way that ought to port readily to almost any computing environment.

binary.to.ternary <- function(x) {
  y <- 0
  x <- round(2^52 * x)
  for (i in 1:52) {
    y <- y + 2*(x %% 2)
    y <- y/3
    x <- floor(x/2)
  }
  y
}

n <- 1000
x <- runif(n)
y <- binary.to.ternary(x)
plot(ecdf(y), pch=".")

edited Aug 12 '16 at 15:42

answered Aug 12 '16 at 15:28

whuber

281,159
54
637
1,101

4

Earlier this year I started a slightly fuller implementation at https://github.com/Henrygb/CantorDist.R with functions `rCantor()`, `qCantor()`, `pCantor()` and a less meaningful `dCantor()` – Henry Aug 12 '16 at 19:06
2

@Henry What would `dcantor` implement? As Tim notes, this distribution has no density. It does not have any discrete atoms, either. It's the archetypal example of a continuous but not absolutely continuous distribution. (I like the implementation of `qcantor`, BTW--it's likely fast by virtue of its exploitation of matrix multiplication.) – whuber Aug 12 '16 at 19:08
It can attempt to give a discrete probability with `dCantor(x, continuous=FALSE)`which is the change in the cumulative probability over a step of $2^{-32}$, which usually gives zero but for example positive for $x=1/4$. But with `continuous=TRUE` it attempts a density which is then zero or rarely infinite. I accept neither is meaningful in this context, but I included the `d` function for completeness. For me a more useful topic would be improving `pCantor` – Henry Aug 12 '16 at 19:40
@Henry I would suggest first improving `qCantor`. It loses precision for values near $0$ and $1$. To regain the lost precision, first reduce the calculation to arguments between $0$ and $1/2$. Then scale the argument into the range $[1/2, 1)$ before computing its digits, and scale back when converting to ternary. Once you have that, `qCantor` will be an invaluable help in testing `pCantor`: compare `pCantor(qCantor(x))-x` to zero for a wide selection of arguments `x`. – whuber Aug 12 '16 at 20:54
@Henry `pCantor` has a subtle bug: floating point imprecision in computing `m` can seriously alter the calculated digits. First adding $3^{-32}$ to `m` should solve the problem, as in `m – whuber Aug 12 '16 at 22:43
1

We must keep in mind that we're only dealing with a finite approximation to the actual distribution. Let's say we had 10 ternary digit precision numbers (in practice they'll be longer), and we generated 0.0222020002 to "represent" a variable whose digits extend further. While the same comment applies to any real-valued r.v. with a continuous rv all the "represented" values the finite length approximation could stand for are also "in the set". In the actual Cantor distribution, almost all the "continuations" of that ten-digit sequence are not in the set. ...ctd – Glen_b Aug 13 '16 at 03:48
ctd... it's a bit like drawing a Koch snowflake -- we can "represent" it by say taking a fourth iterate of the process of putting a triangular bump in each straight line, and it looks kind of neat, but that image we generate isn't the snowflake and doesn't share its properties (e.g. it only has a finite perimeter). Whether this matters for this case depends on what you're trying to do with the numbers you generate; in some situations you may want to consider that when the next digits after that last digit might potentially matter you may need to then generate further digits. – Glen_b Aug 13 '16 at 03:52
@Glen_b The RNG described here generates full double-precision values--indeed, with more precision than most RNGs found in `R`. *Every* method of generating random values from a non-discrete distribution is a "finite approximation." A true random variate from a distribution with no atoms almost surely is not computable or even representable in finite space. What might be of more practical interest in this case is that unsophisticated attempts to implement the PDF and CDF can lose substantial precision in the conversions between binary and ternary. – whuber Aug 13 '16 at 16:50
2

@whuber I clearly acknowledged that every method of generating random numbers is finite precision in my second sentence. That you chose to repeat it and the emphasis that you gave it suggests you missed my actual point there; when I represent a continuous variate to finite precision, the real values that such a finite approximation *could* represent are "in the set" we want to generate from. When I represent a variable such as this to finite precision, the real values such a finite approximation could represent are almost all not in the set. It's rather a different case. ... ctd – Glen_b Aug 13 '16 at 22:44
2

ctd ... no criticism of your post was implied; it was a point that readers might overlook, and may want to consider, particularly if they're trying to infer properties of the Cantor set by simulating from it. – Glen_b Aug 13 '16 at 22:51

How to sample from Cantor distribution?

1 Answers1

Linked