42

I've been looking at Monte Carlo simulation recently, and have been using it to approximate constants such as $\pi$ (circle inside a rectangle, proportionate area).

However, I'm unable to think of a corresponding method of approximating the value of $e$ [Euler's number] using Monte Carlo integration.

Do you have any pointers on how this can be done?

Xi'an
  • 90,397
  • 9
  • 157
  • 575
  • https://en.wikipedia.org/wiki/List_of_representations_of_e – Tim Feb 04 '16 at 12:22
  • 8
    There are many, many, many ways to do this. That this is so might become evident by contemplating what the `R` command `2 + mean(exp(-lgamma(ceiling(1/runif(1e5))-1)))` does. (If using the log Gamma function bothers you, replace it by `2 + mean(1/factorial(ceiling(1/runif(1e5))-2))`, which uses only addition, multiplication, division, and truncation, and ignore the overflow warnings.) What might be of greater interest would be *efficient* simulations: can you minimize the number of computational steps needed to estimate $e$ to any given accuracy? – whuber Feb 04 '16 at 14:34
  • 4
    What a delightful question! I look forward to reading others' answers. One way that you could really draw attention to this question -- perhaps another half-dozen answers -- would be to revise the question and ask for **efficient** answers, as whuber suggests. That's like catnip for CV users. – Sycorax Feb 05 '16 at 03:17
  • are you looking for a geometric analog for e? There are polynomial expressions that would convert to an integral. Mathworld also has a link, and are more credible than wikipedia. [link](http://mathworld.wolfram.com/eApproximations.html) – EngrStudent Feb 05 '16 at 03:35
  • 1
    @EngrStudent I'm not sure the geometric analog exists for $e$. It's simply not a naturally (pun intended) geometrical quantity like $\pi$. – Aksakal Feb 05 '16 at 13:28
  • 7
    @Aksakal $e$ is an *exceptionally* geometrical quantity. At the most elementary level it appears naturally in expressions for areas related to hyperbolas. At a slightly more advanced level it is intimately connected with all periodic functions, including trigonometric functions, whose geometric content is obvious. The real challenge here is that it just so *easy* to simulate values related to $e$! – whuber Feb 05 '16 at 13:42
  • @user777: efficiency is a silly question: by nature, trying to find an MC approximation that is faster than $(1 - \epsilon)^{-\epsilon}$ is probably going to be impossible, so the fastest MC approximation should take that value and perturb it with a random variable that disappears quickly. But there's nothing wrong with silly! My answer goes in the other direction: what's the most inefficient way to estimate to $e$? – Cliff AB Feb 05 '16 at 17:28
  • @user777: okay, I couldn't resist the fast approximation. – Cliff AB Feb 05 '16 at 17:35
  • Why not just use the definition to compute an approximation? For example, in R: a – StatsStudent Feb 06 '16 at 02:00
  • @StatsStudent: the question was about using simulation to achieve an approximation. – Xi'an Feb 12 '16 at 10:03
  • @Xi'an, well, I guess I don't understand why anyone would want to use MC simulation to obtain estimates of a value so easily calculated by more direct means. Using MC to estimate e is like using a nuclear bomb to swat at a bug. – StatsStudent Feb 12 '16 at 18:16
  • 2
    @StatsStudent: $e$ by itself is not interesting. However, if this leads to unbiased estimators of quantities like$$\exp\left\{\int_0^x f(y) \text{d}G(y)\right\}$$this may prove most useful for MCMC algorithms. – Xi'an Feb 12 '16 at 18:58
  • Sure, I suppose. If that is the true intention, then why not ask that directly? – StatsStudent Feb 12 '16 at 19:18
  • 1
    Is this approach (implemented in R) too much of a cheat? `n=1e6;1/mean(log(runif(n))< -1)` (i.e. taking the reciprocal of the proportion of log-uniforms below -1). Or perhaps (but it assumes you know $\pi$): `1/mean(cos(tan(runif(n,0,pi))))` (considering that $\int_{-\infty}^\infty \frac{\cos(x)}{\pi(x^2+1)}=e^{-1}$) – Glen_b Jun 20 '17 at 02:59
  • 1
    @Glen_b What is remarkable about all the answers posted so far is that they adhere to the implicit requirement that the calculation *not* involve any function related to $e$--that is, they should be limited to rational functions (polynomials and, at worst, root-finding). That obviously excludes $\log$ but also, less obviously, any circular function. – whuber Apr 17 '18 at 13:58

8 Answers8

45

The simple and elegant way to estimate $e$ by Monte Carlo is described in this paper. The paper is actually about teaching $e$. Hence, the approach seems perfectly fitting for your goal. The idea's based on an exercise from a popular Ukrainian textbook on probability theory by Gnedenko. See ex.22 on p.183

It happens so that $E[\xi]=e$, where $\xi$ is a random variable that is defined as follows. It's the minimum number of $n$ such that $\sum_{i=1}^n r_i>1$ and $r_i$ are random numbers from uniform distribution on $[0,1]$. Beautiful, isn't it?!

Since it's an exercise, I'm not sure if it's cool for me to post the solution (proof) here :) If you'd like to prove it yourself, here's a tip: the chapter is called "Moments", which should point you in right direction.

If you want to implement it yourself, then don't read further!

This is a simple algorithm for Monte Carlo simulation. Draw a uniform random, then another one and so on until the sum exceeds 1. The number of randoms drawn is your first trial. Let's say you got:

 0.0180
 0.4596
 0.7920

Then your first trial rendered 3. Keep doing these trials, and you'll notice that in average you get $e$.

MATLAB code, simulation result and the histogram follow.

N = 10000000;
n = N;
s = 0;
i = 0;
maxl = 0;
f = 0;
while n > 0
    s = s + rand;
    i = i + 1;
    if s > 1
        if i > maxl
            f(i) = 1;
            maxl = i;
        else
            f(i) = f(i) + 1;
        end
        i = 0;
        s = 0;
        n = n - 1;
    end
end

disp ((1:maxl)*f'/sum(f))
bar(f/sum(f))
grid on

f/sum(f)

The result and the histogram:

2.7183


ans =

  Columns 1 through 8

         0    0.5000    0.3332    0.1250    0.0334    0.0070    0.0012    0.0002

  Columns 9 through 11

    0.0000    0.0000    0.0000

enter image description here

UPDATE: I updated my code to get rid of the array of trial results so that it doesn't take RAM. I also printed the PMF estimation.

Update 2: Here's my Excel solution. Put a button in Excel and link it to the following VBA macro:

Private Sub CommandButton1_Click()
n = Cells(1, 4).Value
Range("A:B").Value = ""
n = n
s = 0
i = 0
maxl = 0
Cells(1, 2).Value = "Frequency"
Cells(1, 1).Value = "n"
Cells(1, 3).Value = "# of trials"
Cells(2, 3).Value = "simulated e"
While n > 0
    s = s + Rnd()
    i = i + 1
    If s > 1 Then
        If i > maxl Then
            Cells(i, 1).Value = i
            Cells(i, 2).Value = 1
            maxl = i
        Else
            Cells(i, 1).Value = i
            Cells(i, 2).Value = Cells(i, 2).Value + 1
        End If
        i = 0
        s = 0
        n = n - 1
    End If
Wend


s = 0
For i = 2 To maxl
    s = s + Cells(i, 1) * Cells(i, 2)
Next


Cells(2, 4).Value = s / Cells(1, 4).Value

Rem bar (f / Sum(f))
Rem grid on

Rem f/sum(f)

End Sub

Enter the number of trials, such as 1000, in the cell D1, and click the button. Here how the screen should look like after the first run:

enter image description here

UPDATE 3: Silverfish inspired me to another way, not as elegant as the first one but still cool. It calculated the volumes of n-simplexes using Sobol sequences.

s = 2;
for i=2:10
    p=sobolset(i);
    N = 10000;
    X=net(p,N)';
    s = s + (sum(sum(X)<1)/N);
end
disp(s)

2.712800000000001

Coincidentally he wrote the first book on Monte Carlo method I read back in high school. It's the best introduction to the method in my opinion.

UPDATE 4:

Silverfish in comments suggested a simple Excel formula implementation. This is the kind of result you get with his approach after about total 1 million random numbers and 185K trials:

enter image description here

Obviously, this is much slower than Excel VBA implementation. Especially, if you modify my VBA code to not update the cell values inside the loop, and only do it once all stats are collected.

UPDATE 5

Xi'an's solution #3 is closely related (or even the same in some sense as per jwg's comment in the thread). It's hard to say who came up with the idea first Forsythe or Gnedenko. Gnedenko's original 1950 edition in Russian doesn't have Problems sections in Chapters. So, I couldn't find this problem at a first glance where it is in later editions. Maybe it was added later or buried in the text.

As I commented in Xi'an's answer, Forsythe's approach is linked to another interesting area: the distribution of distances between peaks (extrema) in random (IID) sequences. The mean distance happens to be 3. The down sequence in Forsythe's approach ends with a bottom, so if you continue sampling you'll get another bottom at some point, then another etc. You could track the distance between them and build the distribution.

Aksakal
  • 55,939
  • 5
  • 90
  • 176
  • 1
    Wow, that's cool! Could you add a paragraph or two explaining why this works? – Sycorax Feb 05 '16 at 04:10
  • @user777, updated an answer – Aksakal Feb 05 '16 at 04:15
  • 11
    (+1) Brilliant! The answer deserves the highest mark as it only relies on uniform simulations. And does not use any approximation but the one due to Monte Carlo. That it connects back to Gnedenko is a further perk. – Xi'an Feb 05 '16 at 15:10
  • @Xi'an, yes, Russian text books have sick exercises sometimes – Aksakal Feb 05 '16 at 15:16
  • @Xi'an Yes, this answer -- and its analogues, such as the ones whuber has offered in comments -- is clearly the best offered so far. – Sycorax Feb 05 '16 at 15:19
  • 2
    Cool! Here is _Mathematica_ code for same, as a one-liner: $$ \dots $$ `Mean[Table[ Length[NestWhileList[(Random[]+#) &, Random[], #<1&]], {10^6}]]` – wolfies Feb 05 '16 at 15:42
  • 5
    @wolfies The following direct translation of the `R` solution I posted in Xi'an's answer is twenty times faster: `n=10^6; 1. / Mean[UnitStep[Differences[Sort[RandomReal[{0, n}, n + 1]]] - 1]] ` – whuber Feb 05 '16 at 16:47
  • @whuber Cool code - but the Xian method used has higher variance than Aksakal method ... I have posted a comparison somewhere way below :) – wolfies Feb 05 '16 at 19:06
  • I guess I'm not goof in Mathematica anymore, no clue what this code is in comments :) – Aksakal Feb 05 '16 at 19:49
  • Thanks, this is great! Is this possible to implement in Excel? – statisticnewbie12345 Feb 06 '16 at 17:36
  • 1
    I have posted the "why is the mean $e$?" question as a [question in its own right](http://stats.stackexchange.com/q/194352/22228); I suspect my sketch solution (which is what immediately came to mind as the "obvious" visualisation of the problem) isn't necessarily the way that Russian students were intended to do it! So alternative solutions would be very welcome. – Silverfish Feb 06 '16 at 18:04
  • @statisticnewbie12345, the easiest way in Excel is with VBA. You can translate my code to it, it has RAND() function for uniform randoms. Let me know if you have troubles, I'll put something together – Aksakal Feb 06 '16 at 19:02
  • @statisticnewbie12345, I posted my Excel solution. As I suspected it was a no brainer port from MATLAB. – Aksakal Feb 06 '16 at 19:46
  • @statisticnewbie12345 A v basic spreadsheet implementation: set cells A1 to E1 as titles `Uniform`, `Sum`, `Position`, `Series length` and `Estimate`. Set cells A2 to C2 as `=RAND()`, `=A2`, and `1`. Clone A2 to A3. Set B3 to `=IF(B2>1,A3,B2+A3)` (so sum accumulates unless already above 1 in which case it restarts). Set C3 to `=IF(B2>1,1,C2+1)` (so position increments, or restarts if sequence does). To measure length of a completed sequence set D3 to `=if(B3>1,C3,"")`. Clone *row 3 only* down a long way; set E2 to `=AVERAGE(D:D)` (or D2:D-whatever in LibreOffice) as mean of the lengths – Silverfish Feb 06 '16 at 20:02
  • Thanks Silverfish! I've run this with 100,000 rows and taken 30 samples (Central limit theorem) but it is still only accurate to 1 decimal place (2.7014983) but falls within a 95% CI. – statisticnewbie12345 Feb 07 '16 at 12:54
  • @statisticnewbie12345, yes, the convergence is not fast, look at the example I give with 1M RAND() calls in Silverfish's Excel schema in my update. – Aksakal Feb 07 '16 at 17:29
20

I suggest upvoting Aksakal's answer. It is unbiased and relies only on a method of generating unit uniform deviates.

My answer can be made arbitrarily precise, but still is biased away from the true value of $e$.

Xi'an's answer is correct, but I think its dependence on either the $\log$ function or a way of generating Poisson random deviates is a bit circular when the purpose is to approximate $e$.

Estimating $e$ by Bootstrapping

Instead, consider the bootstrapping procedure. One has a large number of objects $n$ which are drawn with replacement to a sample size of $n$. At each draw, the probability of not drawing a particular object $i$ is $1-n^{-1}$, and there are $n$ such draws. The probability that a particular object is omitted from all draws is $p=(1-\frac{1}{n})^n.$

Because I'm assuming we know that $$\exp(-1)=\lim_{n\to\infty}\left(1-\frac{1}{n}\right)^n$$

so we also can write $$\exp(-1)\approx \hat{p}=\sum_{i=1}^m\frac{\mathbb{I}_{i\in B_j}}{m}$$

That is, our estimate of $p$ is found by estimating the probability that a specific observation is omitted from $m$ bootstrap replicates $B_j$ across many such replicates -- i.e. the fraction of occurrences of object $i$ in the bootstraps.

There are two sources of error in this approximation. Finite $n$ will always mean that the results are approximate, i.e. the estimate is biased. Additionally, $\hat{p}$ will fluctuate around the true value because this is a simulation.

I find this approach somewhat charming because an undergraduate or another person with sufficiently little to do could approximate $e$ using a deck of cards, a pile of small stones, or any other items at hand, in the same vein as a person could estimate $\pi$ using a compass, a straight-edge and some grains of sand. I think it's neat when mathematics can be divorced from modern conveniences like computers.

Results

I conducted several simulations for various number of bootstrap replications. Standard errors are estimated using normal intervals.

Note that the choice of $n$ the number of objects being bootstrapped sets an absolute upper limit on the accuracy of the results because the Monte Carlo procedure is estimating $p$ and $p$ depends only on $n$. Setting $n$ to be unnecessarily large will just encumber your computer, either because you only need a "rough" approximation to $e$ or because the bias will be swamped by variance due to the Monte Carlo. These results are for $n=10^3$ and $p^{-1}\approx e$ is accurate to the third decimal.

This plot shows that the choice of $m$ has direct and profound consequences for the stability in $\hat{p}$. The blue dashed line shows $p$ and the red line shows $e$. As expected, increasing the sample size produces ever-more accurate estimates $\hat{p}$. enter image description here

I wrote an embarrassingly long R script for this. Suggestions for improvement can be submitted on the back of a $20 bill.

library(boot)
library(plotrix)
n <- 1e3

## if p_hat is estimated with 0 variance (in the limit of infinite bootstraps), then the best estimate we can come up with is biased by exactly this much:
approx <- 1/((1-1/n)^n)

dat <- c("A", rep("B", n-1))
indicator <- function(x, ndx)   xor("A"%in%x[ndx], TRUE) ## Because we want to count when "A" is *not* in the bootstrap sample

p_hat <- function(dat, m=1e3){
    foo <- boot(data=dat, statistic=indicator, R=m) 
    1/mean(foo$t)
} 

reps <- replicate(100, p_hat(dat))

boxplot(reps)
abline(h=exp(1),col="red")

p_mean <- NULL
p_var <- NULL
for(i in 1:10){
    reps <- replicate(2^i, p_hat(dat))
    p_mean[i] <- mean(reps)
    p_var[i] <- sd(reps)
}
plotCI(2^(1:10), p_mean, uiw=qnorm(0.975)*p_var/sqrt(2^(1:10)),xlab="m", log="x", ylab=expression(hat(p)), main=expression(paste("Monte Carlo Estimates of ", tilde(e))))
abline(h=approx, col='red')
Sycorax
  • 76,417
  • 20
  • 189
  • 313
  • 4
    +1 It makes a lot of sense. Any chance you can share your code if you wrote it? – Antoni Parellada Feb 04 '16 at 21:49
  • 2
    Even though this can be arbitrarily accurate, ultimately it is unsatisfactory because it only simulates an *approximation* to $e$ rather than $e$ itself. – whuber Feb 05 '16 at 13:44
  • Yes, you are clear about that, but I wanted to emphasize this point. – whuber Feb 05 '16 at 13:49
  • It works for the same reasons [the code in my comment to @Xi'an's answer](http://stats.stackexchange.com/questions/193990/approximate-e-using-monte-carlo-simulation/193997?noredirect=1#comment368871_193993) works. – whuber Feb 05 '16 at 13:53
  • You made it very clear that it was an approximation, and the idea was great and very nicely explained. Can I bother you to ask about the `ndx` part in `indicator`. I never used the `boot` function before, and can't figure out how it connects, and what role it plays. – Antoni Parellada Feb 05 '16 at 17:23
  • `boot` is just a function in R that facilitates bootstrapping. All it does is generate bootstrap collections of indices to pass to other functions. So `ndx` is the batch of indices from a particular bootstrap re-sample step. – Sycorax Feb 05 '16 at 17:39
  • Would there be a way of just generate the sampling process directly with `replicate` - doing away entirely with `boot`? – Antoni Parellada Feb 05 '16 at 19:13
  • 1
    Sure. You would just end with one replicate call inside of another, which is essentially the same as we have now. – Sycorax Feb 05 '16 at 19:16
  • This would be the way I would walk myself through your code avoiding `boot`: `set.seed(4); n – Antoni Parellada Feb 05 '16 at 22:11
  • Well.. take a look at the Poisson's pmf and a clear relationship to taylor expansion of $\exp(x)$ should emerge. – Sycorax Feb 05 '16 at 22:16
  • 1
    @whuber I don't really see the distinction between an arbitrarily accurate approximation to an arbitrarily accurate approximation to $e$, and an arbitrarily accurate approximation to $e$ itself. – jwg Feb 12 '16 at 15:32
  • 1
    @jwg In addition to being conceptually important, it's also practically important because implementing an approximation to an approximation requires keeping track of how accurate each of the two approximations is. But I would have to agree that when both approximations are acceptably good, then the overall approach indeed is fine. – whuber Feb 12 '16 at 16:47
  • 1
    @jwg One way to think about it is in terms of *bias*. In my solution $\mathbb{E}(\hat{p}) – Sycorax Feb 12 '16 at 16:53
16

Solution 1:

For a Poisson $\mathcal{P}(\lambda)$ distribution, $$\mathbb{P}(X=k)=\frac{\lambda^k}{k!}\,e^{-\lambda}$$Therefore, if $X\sim\mathcal{P}(1)$, $$\mathbb{P}(X=0)=\mathbb{P}(X=1)=e^{-1}$$which means you can estimate $e^{-1}$ by a Poisson simulation. And Poisson simulations can be derived from an exponential distribution generator (if not in the most efficient manner).

Remark 1: As discussed in the comments, this is a rather convoluted argument since simulating from a Poisson distribution or equivalently an Exponential distribution may be hard to imagine without involving a log or an exp function... But then W. Huber came to the rescue of this answer with a most elegant solution based on ordered uniforms. Which is an approximation however, since the distribution of a uniform spacing $U_{(i:n)}-U_{(i-1:n)}$ is a Beta $\mathfrak{B}(1,n)$, implying that $$\mathbb{P}(n\{U_{(i:n)}-U_{(i-1:n)}\}\ge 1)=\left(1-\frac{1}{n}\right)^n$$which converges to $e^{-1}$ as $n$ grows to infinity. As an other aside that answers the comments, von Neumann's 1951 exponential generator only uses uniform generations.

Solution 2:

Another way to achieve a representation of the constant $e$ as an integral is to recall that, when $$X_1,X_2\stackrel{\text{iid}}{\sim}\mathfrak{N}(0,1)$$ then $$(X_1^2+X_2^2)\sim\chi^2_1$$ which is also an $\mathcal{E}(1/2)$ distribution. Therefore, $$\mathbb{P}(X_1^2+X_2^2\ge 2)=1-\{1-\exp(-2/2)\}=e^{-1}$$ A second approach to approximating $e$ by Monte Carlo is thus to simulate normal pairs $(X_1,X_2)$ and monitor the frequency of times $X_1^2+X_2^2\ge 2$. In a sense it is the opposite of the Monte Carlo approximation of $\pi$ related to the frequency of times $X_1^2+X_2^2<1$...

Solution 3:

My Warwick University colleague M. Pollock pointed out another Monte Carlo approximation called Forsythe's method: the idea is to run a sequence of uniform generations $u_1,u_2,...$ until $u_{n+1}>u_{n}$. The expectation of the corresponding stopping rule, $N$, which is the number of time the uniform sequence went down is then $e$ while the probability that $N$ is odd is $e^{-1}$! (Forsythe's method actually aims at simulating from any density of the form $\exp G(x)$, hence is more general than approximating $e$ and $e^{-1}$.)

This is quite parallel to Gnedenko's approach used in Aksakal's answer, so I wonder if one can be derived from the other. At the very least, both have the same distribution with probability mass $1/n!$ for value $n$.

A quick R implementation of Forsythe's method is to forgo following precisely the sequence of uniforms in favour of larger blocks, which allows for parallel processing:

use=runif(n)
band=max(diff((1:(n-1))[diff(use)>0]))+1
bends=apply(apply((apply(matrix(use[1:((n%/%band)*band)],nrow=band),
2,diff)<0),2,cumprod),2,sum)
Xi'an
  • 90,397
  • 9
  • 157
  • 575
  • 15
    As long as one knows how to do Poisson simulation without knowing $e$. – Glen_b Feb 04 '16 at 12:46
  • 1
    A question about this: Don't you use e when you generate draws from a Poission distribution? So you'd estimate 1/e by already knowing e? There might be other ways, but the method I know involves drawing a bunch of values, u_1, u_2, ... from the standard uniform and then letting X = j -1 where j is the smallest j for which the product u_1*u_2*...*u_j < 1/e^\lambda. EDIT: yes, what @Glen_b said – einar Feb 04 '16 at 12:49
  • 5
    If I call R rpoiss() generator, I can pretend I do not know $e$. More seriously, you can generate exponential variates $\mathcal{E}(1)$ [using a $\log$ function rather than $e$] until the sum exceeds $1$ and the resulting number minus one is a Poisson $\mathcal{P}(1)$. – Xi'an Feb 04 '16 at 12:55
  • 6
    Computing $\log$ is tantamount to computing $\exp$, since they are inverses. You can avoid computing any such function in various ways. Here is one solution based on your first answer: `n 1)` It uses only elementary arithmetic. – whuber Feb 04 '16 at 22:13
  • 1
    You could approximate a sample from a Poisson distribution by taking a sample from a Binomial distribution with large $n$, small $p$. This could be as precise as you want. So you can approximately draw from a Poisson without knowing $e$. – Cliff AB Feb 04 '16 at 23:17
  • For the record, applying my suggestion above to Xi'an's solution essentially reduces down user777's answer. – Cliff AB Feb 04 '16 at 23:34
  • It's a Poisson process. – whuber Feb 06 '16 at 14:26
  • @whuber: I actually beg to disagree that for a given and fixed value of $n$, the uniform spacing probability returns $e^{-1}$, as explained further in the answer. – Xi'an Feb 10 '16 at 12:46
  • 3
    I believe that Forsythe's method is the same as Gnedenko's. Choosing a uniform $x_n$ such that $\sum^n x_i$ is less than 1 is the same as choosing $x_n$ smaller than $1 - \sum^{n-1} x_i$, and if we are successful, $1 - \sum^{n} x_i$ is conditionally uniformly distributed between $1 - \sum^{n-1} x_i$ and 0. – jwg Feb 12 '16 at 08:31
  • 3
    I wasn't aware of Forsythe's approach. However, it's linked to something else very interesting. If instead of stopping at $n+1$ you keep sampling, then the expectation of the distance from $n$ to the next bottom is exactly 3. – Aksakal Feb 17 '16 at 14:40
  • 1
    @Aksakal: another interesting property, indeed! Which leads to a Monte Carlo approximation of 3, maybe not currently the most pressing estimation problem!, but still a nice property! – Xi'an Feb 17 '16 at 14:55
  • A thread related to Solution #3: http://stats.stackexchange.com/questions/148962. Leaving this comment to link the threads. – amoeba Mar 06 '16 at 00:38
6

Not a solution ... just a quick comment that is too long for the comment box.

Aksakal

Aksakal posted a solution where we calculate the expected number of standard Uniform drawings that must be taken, such that their sum will exceed 1. In Mathematica, my first formulation was:

mrM := NestWhileList[(Random[] + #) &, Random[], #<1 &]

Mean[Table[Length[mrM], {10^6}]] 

EDIT: Just had a quick play with this, and the following code (same method - also in Mma - just different code) is about 10 times faster:

Mean[Table[Module[{u=Random[], t=1},  While[u<1, u=Random[]+u; t++]; t] , {10^6}]]

Xian / Whuber

Whuber has suggested fast cool code to simulate Xian's solution 1:

R version: n <- 1e5; 1/mean(n*diff(sort(runif(n+1))) > 1)

Mma version: n=10^6; 1. / Mean[UnitStep[Differences[Sort[RandomReal[{0, n}, n + 1]]] - 1]]

which he notes is 20 times faster the first code (or about twice as fast as the new code above).

Just for fun, I thought it would be interesting to see if both approaches are as efficient (in a statistical sense). To do so, I generated 2000 estimates of e using:

  • Aksakal's method: dataA
  • Xian's method 1 using whuber code: dataB

... both in Mathematica. The following diagram contrasts a nonparametric kernel density estimate of the resulting dataA and dataB data sets.

enter image description here

So, while whuber's code (red curve) is about twice as fast, the method does not appear to be as 'reliable'.

wolfies
  • 6,963
  • 1
  • 22
  • 27
  • 2
    A vertical line at the location of the true value would vastly improve this image. – Sycorax Feb 05 '16 at 19:17
  • 1
    It's a very interesting observation, thank you. Since the half-width will scale quadratically with the size of the simulation and the half-width of Xi'an's method is about twice that of Aksakal's, then running four times as many iterations will make them equally accurate. The question of how much effort is needed in each iteration remains: if one iteration of Xi'an's method takes less than one-quarter the effort, then that method would still be more efficient. – whuber Feb 05 '16 at 19:29
  • @user777 Updated: note that the sample size is small (2000 samples of 10^6), which is why I smoothed using a NPKDE plot ... the purpose was to compare the distribution (not to test unbiasedness). – wolfies Feb 05 '16 at 19:30
  • 2
    I believe the situation becomes clear when you compare the numbers of realizations of random variables required in both methods rather than the nominal value of $n$. – whuber Feb 05 '16 at 20:01
  • @whuber I have added new Mma code (still for Aksakal's method) that is now 10 times faster. – wolfies Feb 07 '16 at 06:48
  • 1
    @whuber wrote: `running four times as many iterations will make them equally accurate` ///// ..... Just had a quick play with this: increasing the number of sample points used in Xian's Method 1 from $10^6$ to 6 x $10^6$ (i.e. 6 times the number of points) produces a similar curve to Aksaksal. – wolfies Feb 07 '16 at 14:45
  • 1
    Well done with the code--it will be difficult to improve much on that. – whuber Feb 07 '16 at 14:47
2

Method requiring an ungodly amount of samples

First you need to be able to sample from a normal distribution. Assuming you are going to exclude the use of the function $f(x) = e^x$, or look up tables derived from that function, you can produce approximate samples from the normal distribution via the CLT. For example, if you can sample from a uniform(0,1) distribution, then $\frac{ \bar x \sqrt{12}}{ \sqrt{n}} \dot \sim N(0,1)$. As pointed out by whuber, to have the final estimate approach $e$ as the sample size approaches $\infty$, it would be required that the number of uniform samples used approaches $\infty$ as the sample size approaches infinity.

Now, if you can sample from a normal distribution, with large enough samples, you can get consistent estimates of the density of $N(0,1)$. This can be done with histograms or kernel smoothers (but be careful not to use a Gaussian kernel to follow your no $e^{x}$ rule!). To get your density estimates to be consistent, you will need to let your df (number of bins in histogram, inverse of window for smoother) approach infinity, but slower than the sample size.

So now, with lots of computational power, you can approximate the density of a $N(0,1)$, i.e. $\hat \phi(x)$. Since $\phi(\sqrt(2) ) = (2 \pi)^{-1/2} e^{-1}$, your estimate for $e = \hat \phi(\sqrt{2}) \sqrt{2 \pi}$.

If you want to go totally nuts, you can even estimate $\sqrt{2}$ and $\sqrt{2\pi}$ using the methods you discussed earlier.

Method requiring very few samples, but causing an ungodly amount of numerical error

A completely silly, but very efficient, answer based on a comment I made:

Let $X \sim \text{uniform}(-1, 1)$. Define $Y_n = | (\bar x)^n|$. Define $\hat e = (1 - Y_n)^{-1/Y_n}$.

This will converge very fast, but also run into extreme numerical error.

whuber pointed out that this uses the power function, which typically calls the exp function. This could be sidestepped by discretizing $Y_n$, such that $1/Y_n$ is an integer and the power could be replaced with repeated multiplication. It would be required that as $n \rightarrow \infty$, the discretizing of $Y_n$ would get finer and finer,and the discretization would have to exclude $Y_n = 0$. With this, the estimator theoretically (i.e. the world in which numeric error does not exist) would converge to $e$, and quite fast!

Cliff AB
  • 17,741
  • 1
  • 39
  • 84
  • 2
    The CLT approach is less than satisfactory because ultimately you know these values are *not* normally distributed. But there are plenty of ways to generate Normal variates without needing $e$ or logarithms: the Box-Muller method is one. That one, though, requires trig functions and (at a fundamental level) those are the same as exponentials. – whuber Feb 05 '16 at 13:47
  • 1
    @whuber: I did not use the Box-Muller due to the required log transform too directly to exponential in my book. I **would** have reflexively allowed cos and sin, but that was only because I had forgotten about complex analysis for a moment, so good point. – Cliff AB Feb 05 '16 at 15:11
  • 1
    However, I would take argument with the idea that the generated normal approximation is the weak point of this idea; the density estimation is even weaker! You can think of this idea of having two parameters: $n_1$, the number uniforms used in your "approximated normal" and $n_2$ the number of approximated normals used estimate the density at $\phi(\sqrt{2})$. As both $n_1$ and $n_2$ approach $\infty$, the estimator will approach $e$. In fact, I'm very confident the convergence rate would be much more limited by $n_2$ than $n_1$; non-parametric density has a slow convergence rate! – Cliff AB Feb 05 '16 at 15:15
2

Here is another way it can be done, though it is quite slow. I make no claim to efficiency, but offer this alternative in the spirit of completeness.

Contra Xi'an's answer, I will assume for the purposes of this question that you are able to generate and use a sequence of $n$ uniform pseudo-random variables $U_1, \cdots , U_n \sim \text{IID U}(0,1)$ and you then need to estimate $e$ by some method using basic arithmetic operations (i.e., you cannot use logarithmic or exponential functions or any distributions that use these functions).$^\dagger$ The present method is motivated by a simple result involving uniform random variables:

$$\mathbb{E} \Bigg( \frac{\mathbb{I}(U_i \geqslant 1 / e) }{U_i} \Bigg) = \int \limits_{1/e}^1 \frac{du}{u} = 1.$$

Estimating $e$ using this result: We first order the sample values into descending order to obtain the order statistics $u_{(1)} \geqslant \cdots \geqslant u_{(n)}$ and then we define the partial sums:

$$S_n(k) \equiv \frac{1}{n} \sum_{i=1}^k \frac{1}{u_{(i)}} \quad \text{for all } k = 1, .., n.$$

Now, let $m \equiv \min \{ k | S(k) \geqslant 1 \}$ and then estimate $1/e$ by interpolation of the ordered uniform variables. This gives an estimator for $e$ given by:

$$\hat{e} \equiv \frac{2}{u_{(m)} + u_{(m+1)}}.$$

This method has some slight bias (owing to the linear interpolation of the cut-off point for $1/e$) but it is a consistent estimator for $e$. The method can be implemented fairly easily but it requires sorting of values, which is more computationally intensive than deterministic calculation of $e$. This method is slow, since it involves sorting of values.

Implementation in R: The method can be implemented in R using runif to generate uniform values. The code is as follows:

EST_EULER <- function(n) { U <- sort(runif(n), decreasing = TRUE);
                           S <- cumsum(1/U)/n;
                           m <- min(which(S >= 1));
                           2/(U[m-1]+U[m]); }

Implementing this code gives convergence to the true value of $e$, but it is very slow compared to deterministic methods.

set.seed(1234);

EST_EULER(10^3);
[1] 2.715426

EST_EULER(10^4);
[1] 2.678373

EST_EULER(10^5);
[1] 2.722868

EST_EULER(10^6); 
[1] 2.722207

EST_EULER(10^7);
[1] 2.718775

EST_EULER(10^8);
[1] 2.718434

> exp(1)
[1] 2.718282

$^\dagger$ I take the view that we want to avoid any method that makes use of any transformation that involves an exponential or logarithm. If we can use densities that use exponentials in their definition then it is possible to derive $e$ from these algebraically using a density call.

Ben
  • 91,027
  • 3
  • 150
  • 376
0

The Python version of this is the following if anyone is curious:

import random

print("Number of iterations: ", end="")
n = int(input())

sum_total = 0
for _ in range(n):
    temp = 0
    counter = 0
    while temp < 1:
        temp += random.random()
        counter += 1

    sum_total += counter

print(sum_total/n)
Andrew
  • 101
  • 2
0

If you do not have a calculator (ie you can not compute the exponential 'e' indirectly by using some related functions like computing a sample from a normal distribution or exponential distribution) and you have only coin flips or dice rolls* available to you, then you could use the following puzzle to estimate the number $e$:

The number $e$ appears in the expression for the expectation value for the frog problem with negative steps. We have $E[J_1] = 2e-2$. So we could approximate $e$ with an approximate for $E[J_1] = \mu_{J_1}$ using $\hat{e} = 0.5\hat{\mu}_{J_1}+1$


*and dice rolls could be constructed from coin flips if you want to be more restrictive

Sextus Empiricus
  • 43,080
  • 1
  • 72
  • 161