Mathematica's random number generator deviating from binomial probability?

Question

So, let's say you flip a coin 10 times, and call that 1 "event". If you run, 1,000,000 of these "events", what is the proportion of events that have heads between 0.4 and 0.6? Binomial probability would suggest this to be about 0.65, but my Mathematica code is telling me about 0.24

Here's my syntax:

In[2]:= X:= RandomInteger[];
In[3]:= experiment[n_]:= Apply[Plus, Table[X, {n}]]/n;
In[4]:= trialheadcount[n_]:= .4 < Apply[Plus, Table[X, {n}]]/n < .6
In[5]:= sample=Table[trialheadcount[10], {1000000}]
In[6]:= Count[sample2,True];
Out[6]:= 245682

Where is the mishap?

perhaps this would be better suited to mathematica stackexchange http://mathematica.stackexchange.com/ — Jeromy Anglim, Feb 02 '15 at 04:56
@JeromyAnglim In this case I suspect the problem is probably with the reasoning rather than strictly the coding. — Glen_b, Feb 02 '15 at 05:34
@Glen_b I guess the main thing is that there is a good answer somewhere on the internet, which you appear to have provided. :-) — Jeromy Anglim, Feb 02 '15 at 05:36

Glen_b · Answer 1 · 2015-02-02T08:23:08.913

The mishap is the use of strict less than.

With ten tosses, the only way to get a proportion-of-heads outcome strictly between 0.4 and 0.6 is if you get exactly 5 heads. That has a probability of about 0.246 (${{_{10}}\choose{^5}}(\frac{_1}{^2})^{10}\approx 0.246$), which is about what your simulations (correctly) give.

If you include 0.4 and 0.6 in your limits, (i.e. 4, 5 or 6 heads in 10 tosses) the result has a probability of about 0.656, much as you expected.

Your first thought should not be a problem with the random number generator. That kind of problem would have been obvious in a heavily used package like Mathematica long before now.

Ironically, @TimMcKnight demonstrated binomial probability for us. — Simon Kuang, Feb 02 '15 at 18:10

score 8 · Answer 2 · answered Feb 02 '15 at 08:18

Some comments about the code you wrote:

You defined experiment[n_] but never used it, instead repeating its definition in trialheadcount[n_].
experiment[n_] could be much more efficiently programmed (without using the built-in command BinomialDistribution) as Total[RandomInteger[{0,1},n]/n and this would also make X unnecessary.
Counting the number of cases where experiment[n_] is strictly between 0.4 and 0.6 is more efficiently accomplished by writing Length[Select[Table[experiment[10],{10^6}], 0.4 < # < 0.6 &]].

But, for the actual question itself, as Glen_b points out, the binomial distribution is discrete. Out of 10 coin tosses with $x$ observed heads, the probability that the sample proportion of heads $\hat p = x/10$ is strictly between 0.4 and 0.6 is actually just the case $x = 5$; i.e., $$\Pr[X = 5] = \binom{10}{5} (0.5)^5 (1-0.5)^5 \approx 0.246094.$$ Whereas, if you were to calculate the probability that the sample proportion is between 0.4 and 0.6 inclusive, that would be $$\Pr[4 \le X \le 6] = \sum_{x=4}^6 \binom{10}{x} (0.5)^x (1-0.5)^{10-x} = \frac{672}{1024} \approx 0.65625.$$ Therefore, you need only modify your code to use 0.4 <= # <= 0.6 instead. But of course, we could also write

Length[Select[RandomVariate[BinomialDistribution[10,1/2],{10^6}], 4 <= # <= 6 &]]

This command is approximately 9.6 times faster than your original code. I imagine someone even more proficient than I am at Mathematica could speed it up even further.

You can speed up your code by another factor of 10 by using `Total@Map[Counts@RandomVariate[BinomialDistribution[10, 1/2], 10^6], {4, 5, 6}]`. I suspect `Counts[]`, being a built-in function, is highly optimized in comparison to `Select[]`, which has to work with arbitrary predicates. — David Zhang, Feb 02 '15 at 14:03

gwr · Answer 3 · 2015-11-15T19:03:42.927

Doing Probability Experiments in Mathematica

Mathematica offers a very comfortable framework to work with probabilities and distributions and -- while the main issue of appropriate limits has been addressed -- I would like to use this question to make this clearer and maybe useful as a reference.

Let's simply make the experiments repeatable and define some plot options to fit our taste:

SeedRandom["Repeatable_151115"];
$PlotTheme = "Detailed";
SetOptions[Plot, Filling -> Axis];
SetOptions[DiscretePlot, ExtentSize -> Scaled[0.5], PlotMarkers -> "Point"];

Working with parametric distributions

We can now define the asymptotical distribution for one event which is the proportion $\pi$ of heads in $n$ throws of a (fair) coin:

distProportionTenCoinThrows = With[
    {
        n = 10, (* number of coin throws *)
        p = 1/2 (* fair coin probability of head*)
    },
    (* derive the distribution for the proportion of heads *)
    TransformedDistribution[
        x/n,
        x \[Distributed] BinomialDistribution[ n, p ]
    ];

With[
    {
        pr = PlotRange -> {{0, 1}, {0, 0.25}}
    },
    theoreticalPlot = DiscretePlot[
        Evaluate @ PDF[ distProportionTenCoinThrows, p ],
        {p, 0, 1, 0.1},
        pr
    ];
    (* show plot with colored range *)
    Show @ {
        theoreticalPlot,
        DiscretePlot[
            Evaluate @ PDF[ distProportionTenCoinThrows, p ],
            {p, 0.4, 0.6, 0.1},
            pr,
            FillingStyle -> Red,
            PlotLegends -> None
        ]
    }
]

Which gives us the plot of the discrete distribution of proportions:

We can use the distribution immediately to calculate probabilities for $Pr[\,0.4 \leq \pi \leq 0.6\, |\,\pi \sim B(10,\frac{1}{2})]$ and $Pr[\,0.4 < \pi < 0.6\, |\,\pi \sim B(10,\frac{1}{2})]$:

{
    Probability[ 0.4 <= p <= 0.6, p \[Distributed] distProportionTenCoinThrows ],
    Probability[ 0.4 < p < 0.6, p \[Distributed] distProportionTenCoinThrows ]
} // N

{0.65625, 0.246094}

Doing Monte Carlo Experiments

We can use the distribution for one event to repeatedly sample from it (Monte Carlo).

distProportionsOneMillionCoinThrows = With[
    {
        sampleSize = 1000000
    },
    EmpiricalDistribution[
        RandomVariate[
            distProportionTenCoinThrows,
            sampleSize
        ]
    ]
];

empiricalPlot = 
    DiscretePlot[
        Evaluate@PDF[ distProportionsOneMillionCoinThrows, p ],
        {p, 0, 1, 0.1}, 
        PlotRange -> {{0, 1}, {0, 0.25}} , 
        ExtentSize -> None, 
        PlotLegends -> None, 
        PlotStyle -> Red
    ]
]

Comparing this with the theoretical/asymptotical distribution shows that everthing pretty much fits in:

Show @ {
   theoreticalPlot,
   empiricalPlot
}

You can find a similar post with more background information with regard to *Mathematica* on [Mathematica SE](http://mathematica.stackexchange.com/questions/98834/how-to-model-rolling-a-pair-of-dice-100-times-monte-carlo/98913#98913). — gwr, Nov 15 '15 at 18:05

Mathematica's random number generator deviating from binomial probability?

3 Answers3

Doing Probability Experiments in Mathematica